Proc. Nail. Acad. Sci. USA Vol. 89, pp. 11886-11890, December 1992 Evolution

Major histocompatibility complex class II genes of zebrafish HIDEKI ONOtt, DAGMAR KLEIN§, VLADIMIR VINCEK§, FELIPE FIGUEROAt, COLM O'HUIGINt, HERBERT TICHYt, AND JAN KLEINt§ tMax-Planck-Institut fur Biologie, Abteilung Immungenetik, 7400 Tfibingen, Federal Republic of Germany; and §Department of Microbiology and Immunology, University of Miami School of Medicine, Miami, FL 33101

Communicated by Max D. Cooper, August 21, 1992 (received for review July 17, 1992)

ABSTRACT Twenty cDNA clones derived from b-chalnencoding class H genes of the zebrafish (Brachydanio rerio) major histocompatibility complex (MHC) have been sequenced. They fall into three groups identying three loci of expressed genes. The length and organization of these genes are similar to those of their mammiln homologs. Ampfication by polymerase chain reaction and sequencing of eic DNA from zebrafish collected at different locations in India indicate the existence of a fourth group of sequences (fourth locus). A high degree of polymorphism at the B. rerio MHC loci and concentration of variability to the putative peptide-binding region of the (l1-domain-encoding part of the gene are also indicated. Large genetic distances between aleles sugge trans-specific evolution of fish MHC polymorphism. Zebrafsh genes appear to be derived from a different ancestor than the various class II gene families of other vertebrates. In spite of great sequence divergence between fish and mammalian MHC genes, there seems to be a strking conservation in their overall organization. The mammalian major histocompatibility complex (MHC) is a cluster of genes that encode cell-surface receptors capable of binding short self and nonself peptides (1, 2). MHC receptors occupied by nonself peptides are in turn recognized by T-cell receptors of thymus-derived lymphocytes and this recognition initiates the specific immune response. The MHC molecules are heterodimers consisting of a and ,8 polypeptide chains. The molecules fall into two classes (I and II), which differ in their structure, tissue distribution, and function. The a and ( chains of the class II molecules are both membrane bound, they are predominantly expressed on lymphocytes, and they preferentially present nonself peptides to a subset of T lymphocytes, the helper T cells. In the class I molecules, on the other hand, only the a chain is membrane bound, the molecules are ubiquitously distributed, and they preferentially present antigenic peptides to another subset of T lymphocytes, the cytotoxic T cells. A typical class II gene consists of exons encoding the leader peptide, the two external (al and a2 or (31 and (2) domains, the transmembrane region, and the cytoplasmic tail. The peptidepresenting function is carried out by a specialized peptidebinding region (PBR) assembled from the amino acid residues of the al and (31 domains and encoded in exon 2 of the corresponding genes (3, 4). The PBR sites of the functional MHC genes vary considerably among individuals of the same species. This MHC polymorphism is characterized by two features-a large number of alleles at a given functional locus and a large number of nucleotide differences between some of the alleles (5). Recent data indicate that the polymorphic differences accumulate over periods of time much longer than the life span of a species and that they are passed on from ancestral to emerging species during speciation (6, 7). The

trans-species MHC polymorphism is driven by positive selection at the PBR sites (8, 9). In addition to mammals, MHC genes have also been isolated in birds (10, 11), amphibians (12), reptiles (13), bony fishes (14), and cartilaginous fishes (15, 16). However, information about the functionally important features of the nonmammalian genes is lacking. It is not known, for example, whether the MHC genes of lower vertebrates function in the same way as mammalian genes or whether different vertebrate classes have found different solutions for the mechanism of antigen presentation. This question is particularly pertinent in the face of data indicating that different vertebrate classes differ in the manner in which they generate diversity of immunoglobulin molecules, evolutionarily and functionally related to the MHC molecules (17). In the present study, we have therefore attempted to answer this question by characterizing the class II genes of a teleost (bony fish), Brachydanio rerio, the zebrafish.1 We have chosen this particular species because it offers many advantages for genetic analysis and is rapidly becoming a popular experimental model in a variety of studies (18).

MATERIALS AND METHODS Fisbes. Zebrafish (B. rerio) were obtained from a dealer (Aquarium Pelz, Tubingen, F.R.G.) and from a laboratory stock at the Max-Planck-Institut faur Entwicklungsbiologie (Tilbingen, F.R.G.). Additional specimens were collected in rivers during an expedition to India in 1991. Probes and Primers. The SE1 probe is the PCR product obtained with the primers Tu45 and Tu46 and subcloned into the Bluescript II EcoRV-digested vector. The BC122a probe is a 328-base-pair (bp) EcoRI/Kpn I fragment derived from cDNA class II clone BC122 and encompassing exons 3-5 of the zebrafish class II gene. The BC1241 probe is a cDNA clone encompassing most of exon 2 and the entire 3' part of the coding sequence. The PCR primers were as follows: Tu45, 5 '-TG(T/C)(C/A)(G/T)(G/T)G(T/C)C(A/T)(C/ A)TG(G/A)(T/C)TTCTA(T/C)CC-3'; Tu46, 5'-(A/G)(A/

G)GCTG(G/C)(T/C)GTG(C/A)(T/A)(T/C)CAC(A/C)(T/

A)(G/C)(A/G)CA-3'; Tu385, 5'-TGCTGTCG(A/G)CATTTACTGGAAC-3'; Tu360, 5 'TGCTTTATCACG(G/ T)ACAGCTGA-3'; Tu386, 5'-CCAGAGGTAACAATC(C/ A)AGTCAGTGA-3'; and Tu387, 5'-CCATTCTTTAGTAA(T/G)AGGTTGA-3'. The Tu45 and Tu46 primers, based on carp sequence (14), correspond to exon 3 codons 118-125 and 174-181 of the zebrafish class II MHC genes. All other primers were designed on the basis of the zebrafish class II sequences. Primer Tu385 is identical to the nucleotide seAbbreviations: MHC, major histocompatibility complex; PBR, peptide-binding region. *Permanent address: Department of Dermatology, Yokohama City University School of Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama 236, Japan. IThe sequences reported in this paper have been deposited in the GenBank data base (accession nos. L04805-L04824).

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. 11886

Proc. Natl. Acad. Sci. USA 89 (1992)

Evolution: Ono et al. of exon 1 codons -5 to +3; Tu360 is complementary of exon 2; Tu386 corresponds to codons 98-106 of exon 3; and Tu387 is the reverse sequence of exon 3 codons 182-189. Methods of cDNA library construction and screening, as well as DNA sequencing, are described elsewhere (19-24). Construction of Dendrograms. The inferred protein sequences of Brachydanio cDNA clones were aligned to other vertebrate MHC class II sequences using the CLUSTAL V computer program (25). The percentage amino acid identity between aligned sequences was used to construct distance matrices. The evolutionary relationships were then evaluated by neighbor-joining analysis (26). One hundred bootstrap replications were performed to determine the reliability of the branching order. quence

to codons 89-95

RESULTS Characterization of cDNA Clones. To obtain clones derived from zebrafish class 11 3-chain-encoding genes, we screened two cDNA libraries. The screening of the first library with probe 5E1 produced 21 positive clones, of which 15 were positive on second screening. Of these, we sequenced 8 clones completely, but they all turned out to be truncated and hence provided only partial coding sequences. Screening of the second library with probe BC122a produced 55 positive clones, of which 38 were positive on second screening. Of these, we sequenced 12 clones, 5 of which encompassed almost the entire coding sequence. Altogether we sequenced 20 cDNA clones (Figs. 1 and 2). Based on sequence similarity, the 20 clones could be divided into three groups with intergroup similarity of near complete sequences ranging from 88% to 90%6 and intragroup similarity 1l2

L

I

I

Group

Clone

Length (bp) Eco

B

3-2

,y I I Epn

RI

3'UT I

1400 1028

3X4

BC 122

953

BC 331

895

9-2

2

1392

i---

+

BC 551

545

+

B

1-1

1358

*

B

8-3

1354

*

B

6-3

1312 *

B

4.2

946 *

:

I

: .

I

.

*

I .

I 1

1

B

4-4

~~(A)n

---I

(A)n

IIA~ 1--

--

915 * *

BC 511

555

BC 1231

875 *

BC 1031

911

BC 1241

I

1190

i

* *

Baa

HI

FIG. 1. Length of zebrafish cDNA clones analyzed in the present study. Top line indicates known or postulated position of exons (see text). L, leader peptide; (31 and (32, (31 and (2 domains; TM, transmembrane region; CY, cytoplasmic tail; (A), poly(A) tail. Informative restriction sites are indicated. Length is expressed in bp. In addition to the listed clones, which were fully sequenced, clones B2-3, B4-1, B1O-1, B12-6, and B13-1 were partially sequenced; the first of these was identical to B3-2, whereas the remaining four were identical to group 1 clones.

11887

ranging from 99%6 to 100%. Group 1 contained 13 clones (Bl-1, B8-3, B6-3, B4-2, B4-4, BC1241, B10-1, B12-6, B13-1, B4-1, BC511, BC1231, and BC1031), all of which had identical sequences in their regions of overlap. Group 2 contained two clones (B9-2 and BC551), which again had identical sequences in their overlap regions. Group 3 contained two pairs of identical clones (B3-2, B2-3; and BC122, BC331), as well as clone BC111. The presence of a fourth group is inferred from the intron sequences (see below). We assume that the groups represent four different loci, which we designate Brre-DABI (group 1), Brre-DAB2 (group 2), BrreDAB3 (group 3), and Brre-DAB4 (group 4). In these symbols, Brre designates B. rerio (27), D is for class II, A is for the particular family of loci, and B specifies that the loci encode 13 chains. Alleles at these loci are designated by two-digit serial numbers separated from the locus symbol by an asterisk. Exon-intron junctions (indicated by vertical lines in Fig. 2) have been deduced from comparisons with class II sequences of other vertebrates, with the carp sequence of Hashimoto and co-workers (14), and with the zebrafish class II genomic sequence (see below; H. Sultmann, W. E. Mayer, and J.K., unpublished data). As in other vertebrates, exon 1 encodes a string of hydrophobic amino acid residues (specified by codons -1 to -16), presumably the leader peptide, as well as three nonhydrophobic residues, presumably the beginning of the (31 domain. Assuming that the ATG at the 5' end of the cDNA clones is the initiation codon, the putative leader peptide of the Brre-DAB molecules would be only 16 residues long, which is shorter than that of most other class II genes (1). It is possible, however, that there is another initiation codon upstream from -16 that would make the Brre-DAB leader peptide longer. Exons 1 and 2 are separated by intron 1, which is 200-270 bp long (see below). Exon 2 encodes the remainder of the (1 domain (residues 4-95) and exon 3 encodes the (32 domain (residues 96-189). The rest of the coding sequence specifies a short connecting peptide (residues 190-201), followed by a hydrophobic segment of 19 residues, presumably the transmembrane region (specified by codons 202-220), and then by 14 largely hydrophilic residues, presumably the cytoplasmic tail. The 3' untranslated region is =650 bp long, ending with a canonical polyadenylylation signal (AATAAA), followed after an additional 11-18 bp by the poly(A) tail. Polymorphism of the Brre-DAB Genes. Genomic DNA was isolated from different stocks of zebrafish and from fish collected at different localities in India. The DNA was amplified by PCR using primers Tu385 and Tu360, as well as Tu386 and Tu387. The former pair primed amplification of sequences extending from the 3' end of exon 1 to the 3' end of exon 2 and included the entire intron 1, whereas the latter pair amplified sequences encompassing most of exon 3. The amplified fragments were subcloned and sequenced (Fig. 2). The intron 1 sequences (Fig. 2) support the assignment of the coding sequences to four loci: they fall into four distinct groups, each group characterized by a specific set of deletions and insertions, as well as high intragroup sequence similarity. The exon 2 sequences are the most variable of the entire coding segment. To evaluate the distribution of variability a Wu-Kabat plot was produced (Fig. 3). Within the exon, the variability is concentrated into three regions (residues 9, 11, and 13; residues 27, 29, 31, and 36; and residues 68, 71, 72, and 75). Of the seven most variable residues, six correspond to residues thought to constitute part of the PBR of mammalian genes (4). Sixty-five percent of the variability found in the second exon is associated with residues of the putative PBR. This degree of association is similar to that found in functional mammalian MHC class II genes. Putatively allelic exon 2 sequences differ by multiple nucleotide substitutions (e.g., the Brre-DAB2*01 and Brre-DAB2*04

11888

Proc. Natl. Acad. Sci. USA 89 (1992)

Evolution: Ono et al.

Izoul1

-19

-9

-1

1

... GTC AAA AT0 TAT CTG CTA ATA CT? TTT CTT 0CC ATA TTG ATG CTG TCG ACA TTT ACT OQA ACA Gj Bzre-DADX1'01 0 -I---.T? --A --T -Bzre-DAD2*01 ----------- C-- ---- 0- G---TC---------C -A Brz-e-DAD4*01.....-.-.._ 111 41 21 31 Intronl 1 l 71 51 61 11i 81 91 121 101 T B-reo-flB1'01-Gl---------T---- G---- *TG----G*-AC--T--ATGG --0---------------**CCA0CGGT--T--A-'--GO- .-----T --------G--*G-' -----*A----C-G-A-TGA---- A ..---T-T- --'--Brze-DAB2'02 .. - C--- TAT-T--' -- '--*-----------Brre-DAB2'03-Gl---- T-------A-T ----'*T -----*A---- C-G-A-TGA --- AA-A--.----.A----TA--------CBzre-fl&2'03-G2---- T --------G--*G -------*A---- C-G-A-TGA ---A----Brre-DA82*04 ----T -0------G--*T-'? *A----C-G-A-TGA ----AA-.'A--- TAT-T- ----?--T------------C-------T --------G--*T-'? .*****A--- TAT-T--' Brre-DA&2'05 --------C*A---- C-0;-A-TGA ----C --- T-CAT0CC ------o -------------

0CAA0AAATACAAAAACTATTQACTT

Brre-DAB3*02

TA

GAATCWATAACOT'TT0CTT'OCATOAATTAA?

---3-----BrBrB30r--.-A -A .-Brzo-DAB3*O4 -T.---A-. A.----A .-------Brze-DABd'03.-----------G --* .---- *-A--- T--CA-TOA T-Brre-DAB4*04-0----------G--*T-'----G*-A--- T--CA-TA-- T-A.----A----------

--

131

151

141

161

171

181

191

201

221

211

231

251

241

B3xz*-flA1'01-Gl-----.*--.TA .G-G*-AC**-AT-22 -----. Brze-DAD2'02...------TA .-----T -**TCG-- .-----T.---------Bzre--DAB2*0,3-G1-----T----C_ --C ---'.-------TA-------T---**----*--- ---T---------*--- ---.T.---------Bzze*-DA.2'03-G2 .----T .---C--C-----------.TA.-----T .- *T-C-O --T----------Brre-DAB2*04 -----T----C--C.-----------TA------T---"'*T-C-0;--C ---------- TA.-----T---"*T-C-0 -- *Bzze*-DAB2'05 .- ---T .---C--C.-- ---T.---------BBz T~QGIMAGGTACe-CCDA83G'AGTTACAATAAT0**2TLAGCMAAGAfTTTTTA D9*2 TPCTATTACAACGA * ---A-C--3BZrT --D3 AB -A*'--A------C--A--A--0*-0 B3--3-CT0----AC--G T --- -0-TA---0------------AC------ C--Brrrre-DA----BC 3 -TTA-------C--**TCA---TA---C-T -C 0 -0-TA---BB o-AB*0e----AC--------------AC----- C--T---C-TCG0--- T-GTA-----A ---TAA -T----0T-AG ---C-T -C 271 281 291 4 Ixon 2 14 261 24 Brre-DABZ1'01................... C? GAT OGA TAT TAT CAG TAT ACA ATO TTA ORA TOC ATC TAC AGC ACC AG? OAT TAC AG? OAT ATO GTO Brz-.-DA81'01-01 -T-----T--- C?--- C--TTTTTTTTTTT -Brre-DAB2'01................... -CG-C - 0-G-A- -AT --T-------------- T-------Bre-DAB2'02 ------C-----------------CG0-C- 0- -A- --T- T --------------TT -------Bzr~e-Dl82'03-G1-----C-------T--C A-C--TTG -A- GOT--T--------------T -

--

-

BrBz

---------------------------------------------

Br~re-DA2'03-G2 -----C-------T--C A-C --- TTO -A- GOTG-T--------------T----------------- CG0-C--T -A- 0-? Brrz-e-D2'04------C -T-------------- T---------C--C---TO -A- -AT --T-------------- T-------A~rre-DAD2'05 ------C --------- -Bzre-DAB3'01.0................. : ..~ -TO-A-CG A-C --Brre-DAB3'02 TOCTAACTTTATTTTCTTTTTT"""""'-A T-------G----------------CA A A -------

........:

.

.

.

.

--C TOT 0-- -TO -AT OCT----T--- CA-C --- 000 -AC CA------ T-------G--------------------------------Brre-DABd'01.....................--CG0-C ---TC -A- CAG C - --T---------Brre-DADA'03------C CG0-C ---TC,-A- CAG C-- -T --C ------T---------- CG0-C ----0 -A- AAT-Brre-DAB4'Od------ C

Bkrre-DAB3'03 Bzre-DAB3'0

-----------A-

-------

-------

37 57 47 27 TTO C?? GAA TCA GGT TCT TTC AAT AAA OTT GTO OAT OTT CAG TAC AAC AQC ACT MTO 000 AAG TAT GA 000 TAC AC? GAG CAG GOA GTO AT? TTT Brre-DAD1'01 Br*re-flA--B--'-0-1----G-1---T --- C-- --GTA- --C-------A--------A-- -GBrr.-DAB2'01 ----CAA-A-0-TG------T-----------AAL--Bzre-DAD2'02 -A -A-AC---- TO--C --T-----T------A-- A---T------------AA---0-A -A- T TTO --C-------T--------A-- -0G-- Bzre-flL82'03-G1 -AC A--T ---T------------A ----------- Ak -0TTO --C-------T--------AG------A -A-0--------RBzw-flA2'03-G2 -AC A--T ----- TO--A --------- AA----T ------GTT---C-------T------A-- A-- -G----CAR-A-0-Brre-DA82'Od -A -A-0--AT --- A-- --GOT?---C-------T--------A--G-- -----Bkrre-DAB2'05 ---A-------AA---AGC-? --- AT? --- TA- --C -------------AC-G--------------A T-----T-----0---Bkrre-DAB3*01 ------T-------T0 C-T --- TTT --- TA- --C -------A-----AC--G---------------ABrre-DAB3'02 -

--------- AAC? ACT --- -C? --- TA- --C--------C- T--T ----A---- CTT --- TA- --C -------CT-----T------------------------AA---AT---C? --- TA- --C-------------AC------T--------T-0-G------- A--- A-----C-- --A --T0 -AT----C? --- TA- --C-------------ACT0 -AT --- CTT --- TA- --C-------------AC-A----C--A-

Brre-DAD3'03

C-?

Brze-DAD3'O4 Brz.-flA84'01

--T

Brre-DA9d'03 Brre-DA9d'04

-

79

69

59

89

OCA COA AAC TTC AAC AAA AAC CAG OCA TAC CTO CAG CAA COO AAA OC? GAO GTG GLA AG? TTT TOC AGA CAT AAT 0C? CAG ATC TCG GAC TCA GC? Bzre-DA81'01 --3-1-'--O-1----G---1--e---BrBDr 10rl ?-? -----CA ----C -CA --C ----------0G--- GAG -0-------G- --A --- AT ?-T-----Brre-DAB2'01 0-----C -?-----TCA ----C-CA --C----------0G?---...... ---GAG -0-------G- --A --- AT Brz-e-DAB2'02 ----A-C -?----TCA---- C-CA C-C-----A-0----TG------T...... Brre-DA82'03-Gl1---GAG -0-------G- --A ---OT-

----TCA ----C -CA C-C-----A-0----G-----T --...... GAGO-0-------G- --A --- G? -----A-C -T------- 0-TG------T...... ----TCA ----C-CA"'*-0---G-A-A --- -- A-C -?- -C -CA --C-- A--TC? 0 0-----C -?----TCA--T---------G-----T...... GAG----------- ---------C---?-----------GA----C -CA --C----------------T GAO ------GA ----C -CA --C ----------------T ---OGAT---------------C---?---- T-------CA ----C-CA --C ----------------G ---GAG--------0-G-----C---??? T - A --- -GC CAA "'* --- -C- A--T --...... --- GAG --------------C--T------OGAG--------0-G-----C---?-- --T---??-----C-A ----C-CA --C ----------------G - 0-G-----C --- T-- --T --- ?-----C-A ----C -CA --C ----------------G --- OAO----O-T ----A-? ----C--A --C -T----------------...... ---OGAG--------------CGT-T-

Bzrre-DL2'03-G2

---

Brre-DAB2'04

A--OGAC --T --G

Brre-DAB2'05 Bzreo-DAD3'01

-----

Bz-re-DAB3'02

Bkrre-DAB3'03

Brre-DAB3*04

Brre-DAB4'01 Brre-DAB4*03

-

.

.

.

.

Brre-DABd'04 121 Izon 3 101 91 ill GTC COT OAT AAA OCA 0 TA AAA CCO AAG GTA ACA A?? CAG TCA OTO ATO CAG OCT OALA GOT ALA CAT CCA GC? ATO CTG CTO TOC OAT OCA TAT GAG TTC Brre-DAB1'01 Brre-DAB1'01-G2. ----------C-0-G A- --------T------AG ------- A----A Brre-DAB2'O1 Brre-DAB2'01-Gl ......................A---------C -0---------T------AG------- A-- - A G--------C A- ----?------C?----AG------- A---C-0------- -T 4 Bzre-DAB3'01 ----?-C- -0------ -T ------C A- --A -----------C?---AOGBrre-ZILBd'01 AG-------A --- C -0-0------G ...............G T------AG------- A-Brre-DABd'O5.0........... ------ A------?AG Brro-DAB4*06......A--C -G0-0------ -T 134 124 154 144 TAC CC? AAA AM A?? AAA ATO TM? TOG CTO A"L OAT OAT MA MG GOTO ACC TCA OAT GTG ACC TCC AC? ATA OAG ATO OCT AAT GOT AAC TOG TAC TAT Brre-LILB1*01 -----'--O-1--Brre-DlA1B1 ----A-A ?C--G0-----------------T--------------------C------AC --C---- CAGBrre-DAB2*01 T---------C--G0-----------------Brro-DL82'01-GO1----------C-----AC --C---- C AG------A-A ?.

.

---G- --A G-A----------------- C-G----------A------C---------G -G------0 G- --A G--------C------C -0--------G A------------Br~re-DABdO1 ----0-----------A-- GC-0-----D--ABBZ,-D 2 --4--'-*0--2-C -C------C ---------G A-----------------0-0 ----G- --A G-Brze-DABd'05-------C------C--------A----------------- C--G------0-G- --A G-------Bzre-DA94*06 187 177 167 157 CAG A?? CAC TC? CAC CT? GAA TAC AC? CCC AAA TCT WSA OAG MAG ATC CAA TOT GTO GTG GAG CAC 0CC AOC TCA AC? CAA CC? A?? ACT AAA GAL TOG BrTo-lD"1'01 BrTo-fl881'01-2 -------------------------------------------------------------------------------0---C--T --- AA ---------0-A Brre-D882'01 ------TBrre- U2 --'--A--0---CCG --G -----T ------01GA1-0C-T-----0--------C--?-------------T-33---'-0---C-T BrB --e----T 3*1 -W - - T------------T--- 0---- ------C--? ----C-------T-Bzre-DABd'01------T - ---CC 0--------C--?------------T-------T--A---'0C --T BrB---e----T Az *2 0---- ------C--? ---------------T---A ---T Brz.-DAB4'05 0------C--? B---.... -------TT-T----e---D -'0T--6-----T BzoDA 406 220 210 200 Rzon 4 to termination codon A AT CC? CAT ATC TCT GAG TCC OAC AGO AAT AAL T?? 0CC ATA OGA 0CC TCT GOT CTG OTO CO GGA A?? A?? ATA GCG A?? OC? GSA CTC A?? TAC TAC Bzrr-DAAB1 '01 A-C--C---C -- --C-C--A T--T--Br* DA 2 01 -i- --C ----------T---------------C---A -0-------G--C--C -----------------Brzre-D13301 -i- --C ----------T------ ---------C----A--------------C--C C-----------------Brze-fl8Bd'01 - -- --- ----------T----A-----C-C --C---- A-------------- C--C -----------------Brre-DA8d'02 51 71 1 3'untranslated region 31 41 233 61 223 AAG AAG AAA TCA ACA 000 AGO ATC CTG GTC CCA ALC TOA Brre-DAB1 '01 AAGCAOTGLATCT"'TACOTOTCTLALOCTATOCATGTCTCTAACC - --CT?---O----A----SA C---A------Brre-DA33'01 ---------------------------------AhT-----TA -A-------A------C-----G--0 *---- T----A----GA--------T-----C-------Brre-DAD3*01--------C--G----~_ALT ----T ----A----GA--------T-----C--------A-------A---Brre-DAB4*01--------C A - -G-----ALT----T----------'--2-A------ -A-A-A---SAA -------- CC- ------ C --------Br3 -AB*0e--D-A--

Brre-DAB3'O1

--

--

-

-

-

.

.

.

....

.

.

.

.

.

....

.

-

81 111 211 171 181 161 191 201 141 101 131 151 91 121 A___****CA___AC___C__ -T --- ??*--T--A---- A-G--TAA"*-G-T-"*-OA-TA-T--T-TC ------"'*A--A---??TA-----'-0-*------C-------------TA ------A------------- ---------------T----T-- -T --- TTTTTT--T--A ------G-TOA*-A-G-T----OA-TA-T--T-T-C------A ----G-0--- C* Bzre-DAB4*01 -T--- **"*"*--T--A ------T---AA**-G-T-----O-?A-T--T-T-C ------- **AAA-AC--?TA----G-0--- TV ------------C------------------T----T------- T-T-Brre-DAB4*02 -T--- ****ST--T--A ------T---AA**-GOT---GOA-TA-T--T-T-C -'-----*-ALA-AC--TTA----a-Q--- T* -------------C------------------T----T------- -T-Brrz-Dl*101

Bzrr.-DA82'01

Brre-DIAB3*01

**AAA---

FIG. 2. (Figure continues on the opposite page.)

221

231

11889

Proc. Natl. Acad. Sci. USA 89 (1992)

Evolution: Ono et al. 241

251

261

291

281

271

301

311

321

331

341

* CTCTG CAAAGAT _TT*TTTOT ATAAACAAATTATTTCT*Ga Brre-DAB1'01 GTAATATTGTTT A ----------- T------- A--------- A-a -----------C---------- AT** Brre-DAB2'01 ---------A------- A-- -T--CA--*' ---- AC----------- G-A--TTA --T----C-------------------- A--------- * ------ c-------------Brre-DAB3*'01 --------- A-------A---T--C--- ** ----AC----------- G-A---TA---T--A-C ------C--- -G---C-A-T** Brre-DAB4*'01 A --------A------- A---T--CA--** ---- AC----------- G-A---TA---T--A-C -------C---- G------ T** --------------A--------- A-G ------C-------------C---- G-A -----A--G ---- C---------------- TA---T--A-C Brre-DAB4'02 ----A--- -A------- A---T--C --- ** ---- AC-------------A 471 451 461 431 441 411 421 391 401 381 361 371 351

TG - -----G-C --A -Q--GTTTATA--C ----CTA-AAC-A-TTG--A ----- A--A------- C------------------- T---------------- A- -------- TCAG------ A-*--GC ................................................................................................. --A----- ****TA--C-----CTA-AAC-A-TT--A--A--C***'*TA--C ---CTA-A*C-A-TT -------------A-------- C------------------- A------- T--C--A -G---GA------- AC------- '-------OC------------G * -GC -0------A------- AC ---'----CTA-A*C-A-TT------------- A-------- C------------------- A---------- C-----A ----- *****TA--C --611 591 601 571 581 561 551 S41 511 531 521 491 501 481 **************************************AATATATTGAAC *** TT A T OCCCTCCTCA TCTTAATCC* Brre-DB101 ****C ---CAG_TA--A C A---_ TGATG_ OCA----CCA------------ATTAA -----CTBrre-DAB2'01 TCY---T

Brre-DA81 '01 Brre-DAB2'01

CCvfTTA*******TAATTT*A'"AAAAATTTATTCA

Brre-DAB3*'01 Brre-DAB4'01 Brre-DA84'02

Brre-DaB '01

Brre-DAB4 '02 Brre-DABl '01

Bzre-DABI'01 Brre-DAB4'01 Brre-DAB4'02

----a--G --- C---------OCCTC---------C------------------A---AS QCGCA----CCA--------G-A--Do A-----G -0----C OCTCT--------------------------------- GCCA----CCA-------A--ATTTAAT-A-AG 681 661 671 651 641 621 631 ATT TACAAACAAAA A =ATTTA ATTTTT******AATAT ------ ****TA ---C-------- A.......................................... ------ TTTTTA ---T-T--------------------C---------- ******* -----------.......-.------ TTTT'**--T-........

FIG. 2. Nucleotide sequences of zebrafish cDNA clones and PCR fragments. Identity with the DABI*OJ or DAB3*02 sequence is indicated by dashes, unavailability of information is indicated by dots, and deletions (insertions) are indicated by asterisks. Known or postulated exon borders are indicated by vertical lines. The putative polyadenylylation signal is boxed. Origins of donors of individual alleles: DABI*01, DAB1*O1-G1 and -G2, DAB2*01, DAB3*01, DAB4*01, and DAB4*02, derived from the Tubingen stock; DAB2*01-Gl, DAB2*02, DAB2*03, DAB2*04, DAB3*02, and DAB4*05, from one region in North Bengal, India; DAB3*03 and DAB3*04, from another region in North Bengal; DAB4*03, *04, *06, as well as DAB2*05, from Calcutta, India. DABJ*OJ-G1 and -G2 are genomic sequences identical in exons 2 and 3, respectively, to the cDNA clone B1-1. DAB2*01-Gl is a genomic sequence identical to the cDNA clone B9-2. DAB2*03-G1 and -G2 are genomic sequences identical to each other in their exons but with different introns.

alleles differ by 24 substitutions). Large allelic distances are characteristic of MHC genes in higher vertebrates, where they are indicative of trans-species evolution of MHC polymorphism (6, 7). It is therefore likely that fish MHC alleles, too, evolve trans-specifically. Compared to exon 2, only limited variability is detectable in exon 3 and the rest of the sequence (Fig. 2), an observation that underscores the concentration of polymorphism in the putative PBR in exon 2 of the Brre-DAB genes. Phylogenetic Evaluation. In a genetic distance dendrogram based on amino acid sequences of class II (3 chains from representatives of different vertebrate classes (Fig. 4), the Brre polypeptides form a distinct branch that is not directly related to any other known families of class II genes. The simplest interpretation of this result is that the Brre genes originated from a different ancestral gene than the various class 11(3chain-encoding genes of amphibians, birds, and mammals.

DISCUSSION Although the amino acid sequence similarity between fish and mammalian MHC class II proteins (DR, DP, DO, DQ) is low (26-34%), in all other respects the correspondence between them is striking. The genes (and presumably the encoded proteins as well) are organized in the same way, sequence variability is focused in similar short regions, and the polymorphism is apparently evolving in a similar manner. The concentration of polymorphism into variable regions (presumably PBR) of the (31 domain in the fish genes in turn implies that the function of these genes is probably the same as that of the mammalian MHC-initiation of immune responses against parasites. Hence, except for their sequence, the MHC genes have changed very little during their 800 million years of evolution that separate extant bony fishes from extant mammals (38). This conservatism stands in striking contrast to the evolution of the immunoglobulins, which are the second of the three important players in vertebrate immunity. Not only do different vertebrate taxa possess different immunoglobulin molecules, they also differ fundamentally in the organization of their genes, in the Mouse (H-2Mb) Human (HLA-DMB) - Dog (Cafa-DRB)

-Human (HLA-DRB) - Wallaby (Maru-DABI) Pig (Susc-DQB) -Human (HLA-DPBI) Wallaby (Maru-DBB) Human (HLA-DOB) Fowl (Gaga-B-LB) - Clawed toad (Xele) - Zebrarish (Brre-DAB4*01) - Zebrarish (Brre-DABI$01) - Zebrafish (Brre-DAB2*01)

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

Position FIG. 3. Wu-Kabat variability plot of Brre ,81 domain amino acid residues translated from the nucleotide sequences in Fig. 2. Variability score (ordinate) is given by the number ofdifferent amino acid residues/frequency of the most common residue for each site (28). Horizontal axis shows position of each site in the Brachydanio sequences. Solid bars indicate residues corresponding to the PBR of the mammalian class II ,8 chains (4). Apparent insertion has occurred at residue 23 of Brachydanio sequences relative to their mammalian counterparts. In the three-dimensional model ofthe class II molecule (4), this residue is located in a loop region and therefore does not affect the structure of the PBR. Because of the location of the intron 1/exon 2 splice site, the (1 domain of Brre-DAB molecules is lengthened by an additional amino acid residue.

i

0.0

0.1

0.3 0.2 Genetic distance

0.4

FIG. 4. Phylogenetic tree of representative class 11( chains from different vertebrate classes. Sequence sources: HLA-DOB (29), HLA-DPB (30), Gaga-B-LB (11), Susc-DQB (31), Cafa-DRB (32), HLA-DRB (33), HLA-DMB (34), H-2-Mb (35), Maru-DBB (36), and Xele (37). Scale shows genetic distances expressed as proportion of nonidentical amino acids. Numbering on nodes shows number of times a particular branch is recovered per 100 bootstrap replications.

11890

Evolution: Ono et A

Proc. Natl. Acad. Sci. USA 89 (1992)

manner in which they generate antibody diversity, and in the specificity of their antibody molecules (17). Sequence variability has apparently been an important feature in the evolution ofMHC genes. It is achieved not only by diversification of existing genes, but also by repeated cycles of expansion through gene duplications and contractions through gene deletions. The close relationship among identified Brre loci suggests that they originated by duplication from a common ancestral gene. This ancestor was, however, apparently different from the ancestors that gave rise to (3-chain-encoding class II genes of other vertebrate classes. Thus, in eutherian (placental) mammals, the class II region of the MHC consists of five (-chain-encoding gene families-DMB, DOB, DPB, DQB, and DRB (39). Several 13-chain-encoding gene families also exist in marsupials (36), which separated from eutherian mammals some 120 million years ago (38), but the expressed marsupial MHC genes are equidistant from all the eutherian genes (36), indicating that they arose from different ancestors (40). Similarly, no orthologous relationship can be found between avian class II (-chain-encoding genes and either marsupial or eutherian genes (40, 41), as well as between amphibian, avian, and mammalian genes (16, 40). In each of these vertebrate (sub)classes, therefore, the expressed class II (-chainencoding genes apparently originated from a separate ancestral element by gene duplication. It seems, therefore, that expansion and contraction are important means of adapting the MHC genes to the needs posed by the changing environment.

12. Flajnik, M. F., Canel, C., Kramer, J. & Kasahara, M. (1991) Proc. Natl. Acad. Sci. USA 88, 537-541. 13. Grossberger, D. & Parham, P. (1992) Immunogenetics 36,

We thank Lynne Yakes for editorial assistance, Anica Milosev for preparation of graphics, and Holger Sultmann for advice and sharing of unpublished data. The work was supported in part by Grant AI 23667 from the National Institutes of Health (Bethesda, MD).

28. Wu, T. T. & Kabat, E. A. (1970) J. Exp. Med. 12, 211-250. 29. Jonsson, A. K. & Rask, L. (1989) Immunogenetics 29, 411-

1. Klein, J. (1986) Natural History of the Major Histocompatibility Complex (Wiley, New York). 2. Bjorkman, P. J. & Parham, P. (1990) Annu. Rev. Biochem. 59,

253-288.

3. Bjorkman, P. J., Saper, M. A., Samraoui, B., Bennett, W. S., Strominger, J. L. & Wiley, D. C. (1987) Nature (London) 329, 512-518. 4. Brown, J. H., Jerdetzky, T., Saper, M. A., Samraoui, B., Bjorkman, P. J. & Wiley, D. C. (1988) Nature (London) 332, 845-850. 5. Klein, J. & Figueroa, F. (1986) CRC Crit. Rev. Immunol. 6, 295-386. 6. Klein, J. (1980) in Immunology 80, eds. Fougereau, M. & Dausset, J. (Academic, London), pp. 239-253. 7. Klein, J., Satta, Y., O'hUigin, C. & Takahata, N. (1993) Annu. Rev. Immunol. 11, in press. 8. Klitz, W., Thomson, G. & Baur, M. P. (1986) Am. J. Hum. Genet. 39, 340-349. 9. Hughes, A. L. & Nei, M. (1988) Nature (London) 335,167-170. 10. Bourlet, Y., Bdhar, G., Guillemot, F., Frechin, N., Billault, A., Chauss6, A.-M., Zoorob, R. & Auffray, C. (1988) EMBO J. 7, 1031-1039. 11. Kroemer, G., Zoorob, R. & Auffray, C. (1990) Immunogenetics 31, 405-409.

166-174. 14. Hashimoto, K., Nakanishi, T. & Kurosawa, Y. (1990) Proc. Natl. Acad. Sci. USA 87, 6863-6867. 15. Hashimoto, K., Nakanishi, T. & Kurosawa, Y. (1992) Proc. Natl. Acad. Sci. USA 89, 2209-2212. 16. Kasahara, M., Vazquez, M., Sato, K., McKinney, E. C. & Flajnik, M. F. (1992) Proc. Natl. Acad. Sci. USA 89, 66886692. 17. Amemiya, C. T. & Litman, G. W. (1991) Am. Zool. 31, 558569.

18. Laale, H. W. (1977) J. Fish Biol. 10, 121-173. 19. Ono, H., Figueroa, F., O'hUigin, C. & Klein, J. (1993) Immunogenetics, in press. 20. Vincek, V., Sertic, J., Zaleska-Rutczynska, Z., Figueroa, F. & Klein, J. (1990) Immunogenetics 31, 45-51. 21. Feinberg, A. P. & Vogelstein, B. (1983) Anal. Biochem. 132, 6-13. 22. Holmes, D. S. & Quigley, M. (1981) Anal. Biochem. 114, 193-197. 23. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467. 24. Davis, L. G., Dibner, M. D. & Battey, J. F. (1986) Basic Methods in Molecular Biology (Elsevier, New York). 25. Higgins, D. G., Bleasby, A. J. & Fuchs, R. (1992) Comput. Appl. Biosci. 8, 189-191. 26. Saitou, N. & Nei, M. (1987) Mol. Biol. Evol. 4, 406-425. 27. Klein, J., Bontrop, R. E., Dawkins, R. L., Erlich, H. A., Gyliensten, U. B., Heise, E. R., Jones, P. P., Parham, P., Wakeland, E. K. & Watkins, D. I. (1990) Immunogenetics 31, 217-219.

413. 30. Kelly, A. & Trowsdale, J. (1985) Nucleic Acids Res. 13, 1607-1621. 31. Gustafsson, K., LeGuern, C., Hirsch, F., Pratt, K. & Sachs, D. H. (1990) J. Immunol. 145, 1946-1951. 32. Sarmiento, U. M. & Storb, R. (1990) Immunogenetics 31, 396-399. 33. Young, J. A. T., Wilkinson, D., Bodmer, W. F. & Trowsdale, J. (1987) Proc. Natl. Acad. Sci. USA 84, 4929-4933. 34. Kelly, A. P., Monaco, J. J., Cho, S. & Trowsdale, J. (1991) Nature (London) 353, 571-573. 35. Cho, S., Attaya, M. & Monaco, J. J. (1991) Nature (London) 353, 573-576. 36. Schneider, S., Vincek, V., Tichy, H., Figueroa, F. & Klein, J. (1991) Mol. Biol. Evol. 8, 753-766. 37. Sato, K., Flajnik, M. F., Du Pasquier, L., Katagiri, M. & Kasahara, M. (1992) J. Immunol., submitted. 38. Carroll, R. L. (1988) in Vertebrate Paleontology and Evolution (Freeman, New York), p. 3. 39. Carson, S. & Trowsdale, J. (1986) Oxford Surv. Euk. Genes 3, 63-94. 40. Klein, J., Ono, H., Klein, D. & O'hUigin, C. (1993) in Progress in Immunology, eds. Gergely, J. & Petranyi, G. (Springer, Heidelberg), Vol. 8, in press. 41. Xu, Y., Pitcovski, J., Peterson, L., Auffray, C., Bourlet, Y., Gerndt, B. M., Nordskog, A. W., Lamont, S. J. & Warner, C. M. (1989) J. Immunol. 142, 2122-2132.

Major histocompatibility complex class II genes of zebrafish.

Twenty cDNA clones derived from beta-chain-encoding class II genes of the zebrafish (Brachydanio rerio) major histocompatibility complex (MHC) have be...
1MB Sizes 0 Downloads 0 Views