Gene. 112 (1992) 163-170 © 1992 Elsevier Science Publishers B.V. All rights reserved. 0378-1119/92/$05.00

163

GENE 06320

Conservation of antifreeze protein-enco~ing genes in tandem repeats (Multigene family; winter flounder; nucleotide sequence; gene conversion; recombination)

Peter L. Davies Department of Biochemistry, Queen's University. Kingston, Ontario, K7L 3N6 (Canada)

Received by J.[. Slightom:9 June 1991 Accepted: 28 September 1991 Received at publishers: 20 December 1991

SUMMARY The antifreeze protein (AFP) multigene family in winter flounder is made up of genes in tandem repeats and others that are linked, but irregularly spaced. The close spacing of the tandemly repeated genes has provided an opportunity to determine how well nearest-neighbour AFP genes from the tandem repeats resemble each other, the genes elsewhere in the tandem repeats, and the genes outside the tandem repeats. Four pairs of tandemly repeated genes were sequenced. Two pairs were identical and probably represent independent clones of the same chromosomal region. The six unique genes coded for one or other of the two major AFPs, HPLC-6 and HPLC-8. Their comparison over a region of 900 bp, from just after the CAAT box to just before the polyadenylation signal, showed a maximum of 26 single-bp changes and no major insertions or deletions. Nearest-neighbour genes had almost as many changes as genes elsewhere in the tandem repeats. However, these six genes were much more homogeneous than five AFP genes from outside the tandem repeats, none of which encode a major AFP component, and all but one had large insertions, deletions, or rearrangements. Despite the similarity of the genes in tandem repeats, three variants due to short insertions or deletions in the 3'-flanking DNA were interspersed, and genes coding for HPLC-6 and HPLC-8 were nearest neighbours. It is suggested that enhanced opportunities for recombination between repeats have helped keep these genes more uniform than those outside the tandem repeats. Potential 'hot-spots' for recombination are the oligo(dG) tract in the intron, which shows length variation from gene to gene, and the 33-bp repeats in the 3'-exon.

INTRODUCTION Antifreeze proteins (AFP) are produced by a number of marine teleosts in which they serve to lower the freezing point of the serum and thereby that of the whole fish

Correspondenceto: Dr. P.L. Davies,Departmentof Biochemistry,Queen's University, Kingston, Ontario, K7L 3N6 (Canada) Tel. (613)545-2983; Fax (613)545-6612.

Abbreviations: aa, amino acid(s): AFP, antifreeze protein; AFP, gene (DNA) encoding AFP; bp, base pair(s); eDNA, DNA complementaryto mRNA; HPLC,high-performanceliquidchromatography;kb, kilobase(s) or 1000bp, nt, nucleotide(s); RFLP, restriction fragmentlengthpolymorphism; tsp, transcription start point(s).

(DeVries, 1983; Fletcher et al., 1986; Davies et al., 1%,,~. At the present time four different types of macromolecular antifreezes (three AFP and one antifreeze glycoprotein) have been described (Davies and Hew, 1990). Despite their structural dissimilarity they each act non-colligatively to depress the freezing point of aqueous solutions by binding to the prism faces of ice crystals and arresting growth of the crystal in the thermodynamically preferred direction, along the a-axes. As the solution temperature is lowered further and crystal growth resumes, expansion along the c-axis is favoured. The magnitude of the freezing point depression is dependent on AFP concentration but the relationship is non-linear; unit increases in AFP levels produce less than incremental increases in freezing point depression (DeVries, 1983; Davies et al., 1988). Typical AFP concen-

164 trations needed to protect fish at the freezing point of sea water (-1.9°C) are 10-20 mg/ml. Type-I AFPs are Ala-rich, amphipathic s-helices. In winter flounder (Pseudopleuronectes americanus) this AFP is produced seasonally by the liver (Lin and Long, 1980; Pickett et al., 1983; Fourney et al., 1984a) as a preproprotein (Davies et al., 1982; Pickett et al., 1984). HPLC analysis of the mature, circulating AFP shows that it is a mixture of independently active components (Fourney et al., 1984b). Two major components, HPLC-6 and -8, together make up 90% of the AFP fraction. They are each 37 aa long and contain three l l-aa repeats of the sequence ThrAIxAlxAsxAlxAlxAlxAlxAlxAlxAlx where Alx represents Ala or another aa that favours s-helix formation and Asx represents Asp or Asn. Based on their aa composition, several minor AFP components seem to represent simple variants of HPLC-6 and -8. Consistent with the multiplicity of antifreeze components detected by HPLC, genomic Southern blots have shown that the AFP are encoded by a multigene family (Davies et al., 1984; Gourlie et al., 1984). Winter flounder from the waters around Newfoundland have as many as 30-40 copies of the AFP gene. Approximately two thirds of the genes are present in tandem, direct 8-kb repeats, with one gene per repeat (Scott et al., 1985). The others are linked but irregularly spaced (Davies et al., 1984). Five genes from the latter category have been sequenced. One (E3) is a close match to a gene encoding HPLC-6 but with an Ala to Asp codon replacement. Another (21a) encodes a more unusual variant with a shortened pro-region but a larger mature protein section containing four, 11-aa repeats (Gauthier

et al., 1990). The coding regions of this gene are a close match to one of the AFP cDNA sequences reported by Gourlie et al. (1984). Both genes have all the appropriate regulatory and processing signals, and although their products have not yet been identified in flounder serum there are no indications that the genes are not functional. Three other genes (11-3, F2, and 5a) are more extreme variants and appear to be pseudogenes (Davies and Gauthier, 1992). Prior to this report, one of the genes in the tandem repeat arrangement was sequenced (Scott et al., 1988a). It is approx. 1 kb long and is divided into two exons. The 5'exon encodes most of the signal aa sequence, and the 3'exon encodes the proprotein region. The coding region for the mature protein portion is a perfect match with HPLC-6, and all of the regulatory and processing signals appear to be functional. The length of the tandem repeat (approx. 8 kb) is short enough that almost two complete repeats are contained within the insert of 2Charon30 AFP genomic clones. This has provided an ideal opportunity to compare nearestneighbour genes within the clusters of tandem repeats at the nt sequence level. In this way it has been possible to assess their coding potential and sequence conservation relative to the AFP genes that are linked but irregularly spaced. RESULTS AND DISCUSSION

(a) Cloning and sequencing the tandemly repeated AFP genes Four 2Charon30 genomic clones (21A-l, 2A-7, 3-3 and 4-2) were picked for a pairwise comparison of nearest-

~, phage

T.B kb BIK

B/K

- 2

4

B/K 2A-

7

IA-

1

-.-d

°

I

T T

b

? ?

°

T T

°

*

B/K

.T

b TTI

*T

b T?I

I

B/K

B/K 3

-

3

I

TTT

I, lilK

I

Fig. 1. Restriction maps of genomic clones containing tandemly repeated AFP gene pairs of the winter flound., . I nese genor clones were previously referred to (Scott et al., 1985) by the designations in parentheses, 24-2 (2AP-1); 22A-7 (2AP-2); 2 IA-1 (2AP-4); and )3-3 ( 2 A P . ) . Flanking open boxes represent part of the 2Charon30 arms; asterisks at these junctions represent a BamHl site generated by ligation during cloning. Restriction sites are BamHl (B), Kpnl (K), Sstl (blackened square) and Sail (S) B/K indicates that the order of these sites has not been determined. The AFP genes (approx. 1 kb) encoding HPLC-6 and -8 are underlined with short solid lines and dotted lines, respectively. Restriction length polymorphisms in the Sstl fragment encompassing the gene are designated 'a', 'b', or 'c'. These Sstl fragments were subcloned into Sstl.digested pTZ 18R (Pharmacia, Piscataway, N J) and transfected into E. coli NM522. The orientation of the insert was determined from Sail digestion. In the course of this investigation it became apparent that .he restriction map for 23-3 was incorrect in the original report, both for the positioning of a Sail site and for the order of the Sstl polymorphism, wh;qb-!s 'b'-'a' rather than 'a'-'b' (Scott et al., 1985). The corrected version is shown here. 23-3 has the same arrangement of restriction sites as 21A-1 and the same order of the Sstl polymorphism. Although it is a distinct genomic clone because its insert is longer at both ends and the junction with the long arm has resulted in the regeneration of a BamHl site not present in 21A-I, it probably contains the same two genes.

165

Sst ! ~GT~T~C~AGGATT~CT~AA~TT~ACACAGAGGTGTGAAAAG~CA;~AAGGGTCA~TCTTATTGAACAGCCGCCATTTTGGGAGAACTCTGATTT ]00 ACATAGA•ACACACACTCACATACACCTCTCTGTTTAGTTAGCACGTTGTTAGTAATTTGTGTTTTTATA•TATATTATATTCATAATATATTTTTCTTT 200 CACAAATGTTCTTTCATTCATGTTTCAGCAGAGATTGGCGCAGAAACAAAGAGAGCCACCGTTTCAGTTlAACTTCCFGACGAAACGTGTTCACGACCTT300 GTTTCAGAAG•AGTTTGTTGATGTGACTTGAACCATCGCTGGG•ATCATGGTGTTTCAAACAGGTTGGGAACACTGTGTGAGTCCATTAAACTGGGAAAA 400 AACAAAGTGACCGTGGTCACATTTAAACTATTGATTTAGTTCATGAGAAGTGTGAACTTGCACAACACTGGGGGAGTGTTGTA~--~TGCTCAGATTG500 GTCGACAGTCAAGCGATGACCCAGGCTCCAGTTAC~ATAAAA~AGATTCACATTGACCTGGATATTCACCACATCTTCATTTTGTAGTGAACCAGTGCTC600 CCTACAAGTTCTCAAAATGGCTCTCT•ACTTTTCACTGTCGGACAATTGATTTTCTTATTTTGGACAATGAG•tac•t•aacactcacttt•tttcttct M A L S L F T V G Q L ! F L F W T M R

700

at•aatct••ttttact•taaatatctt9••aa••aa••aa••atatct•cattatcccc•a••••ccattt•ttttaca•cca•c•9t•aaa•at•aa•

800

atcttcatcc•t•ttcatct9ttt•accct•attaacacaa•at••tcacat••accatctttatttacataat9tttcatca•cacttcct•ttttca•

900

ccc•aaacttaaa•a••cctcat••aaacttcct•at•atct•9t•acacct•ct••tt•aa9•aaaca9a•ttt•a•a••c9•ca9aaaaaattatttt

1000

a•ttt9aat9aa•aa•ct•tcattt9atttcat•tt9••9•9•9•••99•tcatcacacaca•atatt•ataact9tcatcact•a•ttt9•t9aaa•t9

1100

ac9•acca•taaat•tt•t•atatataatattatcataataattataataataccattaatctct•ca•AATCA•TGAAGCCAGACC•GAC•••GCAGCC ] T E A R P D P A A

1200

AAAGCCGCCCCAGCAGCAGCTGCCGCCCCTG•CGCAGCCGCCCCAGACACCGCCTCTGACGCCGCCGCTG•AGCCGC•CTTAC•GC•GCCAACGCCAAAG 1300 K A A p A A A A A P A A A A P D T A S D A A A A A A L T A A N A K CCGCTGCCGAACTCACTGCCGCCAACGCCGCCGCCGCCGCAGCAG•CACCGCCAGAGGTTAAGGATCGTGGTCGTCTTGATGTGGGATCATGTGAACATC 1400 A A A E L T A A N A A A A A A A T A R G STOP TGAGCAGCGAGATGTTACCAATCTGCTG~CTGAGAAGCTGATTGTTAAAAAC~AAGTGTCCTGTT~ATTTCATCTCTGAAAGTC~GTCACAGTT 1500 TCTGTAGATCATGTAGACT•CAGGAAGTGATGCCATTGTGCTGTTGAACCTGCAGGTCTACAAGGTTTCATAACTGATTTAGATTTTTAAATACTGACTT 1600 ATTATTCACATTTTCATTCTCACCAGCTCTATGAGTATTTCTCCTTCAAGTACAGATGTGGACAGTGGTGGAGGAAGTACTGAAGTTTAGTACTTAAGTA 1700 AAAGTACAAGAACCCTGGAAAATATATACTTAAGTAAAAGTAAAAGTACTA•ATCAA•AATCCTACTTAATTAAAAGTAAAAAGGACTTACTTTTAAATT]800 TACTATAAGTATTAAAAGTAAAAGTACTCACGCAATGGGTTGCCTCTCAATGTCTAGGCTGTG•CATTTTGATAAAGAATGCAGATATAGCTACTGGTAA ]900 TACTCATGCCT•TACAGATGT•ACTA•TAGTAACAATTATAAGCAACAACATTTGTTTATTGGAAAGGTTGTTGTACTTATTGTG•TTACCCTCTGTAAT 2000 A•TGTTCACACTCTATCTTACAATTCTGGGGATGAGAAGTTATCTA•CAGGGATTATCTGCAAAGTTAAAACCAT•AACTCAAACCAATAAAGACAACAT2]00 GTTAAATCTTTAAATGTTTTATATTTAATTGTAAGTATTTAAAAAATGGAAACATGGAA•ATGAAAAACAACTGTAAAACTGGTCAGAA•AAGGCCGATC2200 TTAGAATGAGA•ATTTAAAATGAAGGACCTTGAAATGAAAATTTGACCATTGGTCAGGCAGTCAAAATATT•ATCCTCTGAATATTTTGAAATGGTTTAG2300 ACGGAAAGGGAAATGTTATACTTTTATTTTGAAAAGGTAGTT•CTGAAAAACGAAAAAGATCGCTAAATGGCAGTGCCACGTTGTTGCATTACGCTTAGT 2400 TGTTTAGCGGTGGACACGTGAAGATGCTGCAGCCGCTGCACGAGCCTCGTTGTGAAGTGTGTAGATTTAACGCTT•TCCCGCGGATCAG•TGAGGAAA•A2500 ACGACTCGTGTGTGAAGCAGCTCGACTGAACACATTCATCATTCAAACCAGAAGCGAATCAGACCCGGACC•AACTTCACTGGCTGAGC•CGGTTCCGTT 2600 CAACACATCACATTGATATCTGCTGAATGAAAGCTCGATTAATCGATCATATCCGCGCCGAATGTAAGA•CGGGACGGACGTCTTCCTAAAACATCTTCA 2700 GTCGTGGTCTCCACCGTGTGTCCGTGTAACACGAACTAAAACCTGTTAAATGTCACGTTGTTTTAAAGGACTTTTGTGTGACGTCGCTCTGCCGGTAGAA 2800 GCTCGCTGCTTGTTTCCTTCATGTTGTGAATCAGGTCTGAACCTGAACCTCCAGCTCACACGTTCA•CTTCAGTAAACACGCACACACCCTTCA•ACTGG 2900

Sst I GAGTCTGTTCTC~ Fig. 2. The nt sequence of a tandemly repeated winter flounder AFP ~ne. The sequence of the 22A-7b Sstl fragment is shown with the intron in lower case-letters and the aa sequence in ~ e one-le~er code undeme~h. The CAAT and TATA box sequences, polyadenylation signal and flanking Sstl sites a n boxed and the tsp is indicted by a tdan~e. The nt sequences wen determined by the dideoxy method (Sanger et ~., 1977) using Sequenase (US Biochemicals, Cleveland, OH) and synthetic oligodeoxyribonucleotidepalmers. Single-stranded template DNA was produced ~om the cloned SstI bagments ~llowing the addition of helper phage M 13 K07 (Pharmacia) acco~ing to the manu~ctureCs instructions. The 2A-7b sequence has been deposi~d in GenBank (accession No. M62415).

166 neighbour AFP genes in the tandem repeats (Fig. 1). These clones were chosen from a previously characterized set (Scott et al., 1985) because they contain two complete genes in adjacent 7.8-kb repeats. In each genomic clone the two AFP genes were present in different sized Sstl fragments ('a', 'b', or 'c'). The Sstl polymorphism allowed the separation and identification of the two genes following shotgun cloning of the Sstl-digested genomic clones. Four examples of the Sstl 'b' polymorphism and two each of the Sstl 'a' and 'c' polymorphisms were observed. Three pair-wise relationships were present, 'b'-'c', 'c'-'b' and 'b'-'a'. The sequence of a representative Sstl fragment, the 'b' polymorphism from ;t2A-7, is shown in Fig. 2. It represents an extension of the sequence from nt positions 460-1555 reported by Scott et al. (1988a). The fragment is 2918 bp long and includes over 1.4 kb of 3'-flanking sequence beyond the preferred polyadenylation signal beginning at nt position 1429. Two other distal polyadenylation signals are located in this region, one beginning at nt position 1769 and the other at nt position 2087. These sites are used infrequently during transcription in flounder liver (Rancourt et al., 1987). However, when expression of a heat shock protein 70 promoter/AFP fusion gene is induced by heat shock in transgenic Drosophila melanogaster the most distal polyadenylation site accounts for a significant amount of the AFP mRNA polyadenylation (Rancourt et al., 1987). (b) Length polymorphism in the tandem repeats A comparison of the equivalent 3'-flanking regions from the Sstl, 'a', 'b', and 'c' fragments revealed the basis of the A "G G 215 C A T C bp T ....

ACTG-G

G" GG T T C 150 T C bp G T

....

T'5~C ~bp~

'a'

type

3 0 6 8 bp

A" "G G C A 215 T C bp T 'b' ....

A C T G-G G

; 2400

T G-T G-G T G G . . . . TI ,5~C bpo T,. A,.G

'c' ....

ACTGGGT

TGTGG

TGG

type

2 9 1 8 bp

type

.... 2 6 9 8 bp

Fig. 3. RFLP in the Sstl fragments containing the AFP genes. The basis for RFLP in the 'a', 'b', and 'c' Sstl fragments is illustrated by the presence or absence of nt sequences flanking the central 5 ' - G T F G T sequence. The numbering scheme relates to the 'b' type sequence shown in Fig. 2.

length polymorphisms (Fig. 3). They are due to insertions/ deletions flanking the pentanucleotide sequence 5'-GTTGT located from nt positions 2399-2403. The 'a' and 'b' SstI polymorphisms each have a 215 bp sequence 5' to the pentanucleotide which is absent from the 'c' polymorphism. The 'a' Sstl polymorphism has a 155-bp sequence 3' to the pentanucleotide which is absent in the 'c' type and reduced to 5 bp in the 'b' type. It is not apparent that there is any functional significance to these size polymorphisms but their interspersion within the tandem repeats demonstrates that the extent of homogenization or conversion between repeats is not absolute.

(c) Pairwise comparisons of AYP gene sequences To further explore the question of gene conversion/ homogenization between repeats, 900 bp of AFP gene sequence extending from the 5'-flanking region to the 3'untranslated region of four pairs of neighbouring repeats were compared (Fig. 4). There are a number of changes between six of the eight genes, but there is complete identity between the genes in the 'b' Sstl polymorphism of 21A-I and 23-3, and between the genes in the 'a' Sstl polymorphism from the same clones. Although it is possible that the gene pairs in these two genomic clones are identical because of a recent duplication or gene conversion event (Slightom et al., 1980), it is more likely that slightly different lengths of the same DNA or an identical allelic pair were cloned into the two 2 phage. Out of the six different genes only one (MA-Ia) encodes for the HPLC-8 AFP component. Because this gene occurs alongside 21A-lb, which encodes for the HPLC-6 component, it appears that genes encoding the two major components are interspersed, as are the Sstl fragment len~h polymorphisms. Also, the possibility exists that the HPLC8 encoding genes are linked to the Sstl 'a' polymorphism since the three distinct 'b' and two 'e' polymorphisms are associated with the HPLC-6 encoding genes. Although the small sample size does not allow more than speculation on this point, it is noteworthy that HPLC-6 and -8 are present in serum in the ratio 60:40, and that this is quite close to the ratio of the Sstl polymorphisms 'b'+ 'c' vs. 'a' (Scott et al., 1985). The genes in tandem repeats are different from each other but are far more homogeneous as a group than are the linked but irregularly spaced genes (Davies et al., 1984; Gauthier et al., 1990; Davies and Gauthier, 1992). Similarly, nearest-neighbour genes in the repeats are more alike (Table I) than neighbouring pairs of the irregularly spaced genes like E3 and F2, and 11-3 and 5a (Davies et al., 1984; Davies and Gauthier, 1992). This suggests that the tandem repeat organization is conducive to AFP gene homogenization by the same mechanisms that keep the tandemly repeated histone genes (Hentschel and Bimstiel, 1981) and

167 2A-7b IA-]b 3-3b 4-2b IA-la 3-3a 2A-7c 4-2c

GTCGACAGTCAAGCGATGACCCAGGCTCCAGTTACTATAAAACAGATTCACATTGACCTGGATATTCACCACATCTTCATTTTGTAGTGAACCAGTGCTC ...................................................... C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...................................................... C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................................................................................................... ...................................................... C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...................................................... C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................................................................................... ...................................................................................................

2A-7b IA-lb 3-3b 4-2b ]A-la 3-3a 2A-7c 4-2c

CCTAcAAGTTCTCAAAATGGCTCTCTCACTTTTCACTGTCGGACAATTGATTTTCTTATTTTGGACAATGAG9ta¢gt9aa¢act¢a¢t t tgLt tct tct ....................................... 1". . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....................................... T ............................................................ .................................................................................................... .................................................................................................... .................................................................................................... .................................................................................................... ....................................................................................................

700

2A-7b 1A-lb 3-3b 4-2b lA-la 3-3a 2A-7c 4-2c

at9aatctggttttactgtaaatatcttgggaaggaaggaa9gatatctgcattatccccgaggg9ccatttgttttacagcca9cggtgaaagatgaa9 ........ t .................................................................. c .......................... ........ t ................................................................ c .......................... ........ t ................................................................ c .......................... ......................................................................... c .......................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c. . . . . . . . . . . . . . . . . . . . . . . . . . .

800

2A-7b lA-lb 3-3b 4-2b IA-la 3-3a 2A-7¢ 4-2c

atct•¢atcc9••tt¢at¢•9ttt9ac¢ct•a•taaca•aa9at•••¢acat•9accat•tt•attta•ataa••ttt**catca•ca¢•t¢ct•ttttca9 .............................................. c ............................... ,e ...................... .............................................. c ............................... ,e ...................... .............................................. c ............................... ta ......................

2A-7b |A-]b 3-3b 4-2b lA-la 3-3a 2A-7c 4-2c

¢¢¢~aaa¢t~aaaga~gc¢~cat~9aaactt¢ctga~gat¢~t~aca¢ct9¢~gtt~aa~9aaacagagttt9a~aggcgg~agaaaaaat~atttt ....................... t .......................................................... a ...... c .... g ..... ....................... t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .a . . . . . . c .... g ..... . . . . . . . . . . . . . . . . . . . . . . . t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .a - - a . . . . . . . . . . 9 .... ---t ................................................................................................ ---t ................................................................................................ .................................................................................................... ....................... t ......................................................... a - - a ...............

lO00

2A-Tb IA-lb 3-3b 4-2b IA-la 3-3a 2A-7c 4-2c

agtttgaat9aagaagctgtcatttgatttcatgtt999.•(14)..g9gtcatcacacacagatattgataactgtcatcactga9tttg9tgaaa9tg

llO0

2A-7b

ac99a¢¢a9taaat9tt9t9atatataatattatcataataattataataata¢cattaatctct9

1A-lb 3-3b 4-2b lA-la 3-3a 2A-7c 4-2c

.................................................................................................... .................................................................................................... ................................................................................... .......................... t ......................................................................... .......................... t ......................................................................... -a .................................................................................. ...................................................................................

2A-7b IA-lb 3-3b 4-2b IA-la 3-3a 2A-7c 4-2c

AAAGCCGCCCCAGCAGCAG¢•GCCGCCCCTGCCGCAGC¢GcCC£AGACACcGCCTCTGACGC¢GCCGCTG¢AGCCGCCCTTACCGCCGC¢AACGCCAAAG

HA

........

L

S L F T V G Q L

I

F L F N T H R

t ..................................................

a .............

c ..........................

..............................................

c ...............................

tat

.............................................. ..............................................................................

c ...............................

tat ..................... et ......................

..............................................

c ...............................

*~

.....

.....

a ........................

t ........

a ........................

..

......................

....................................................

...(9)..-

....................................................

t ........

..(16)..-

....................................................

....................

t .........

t ........

..04}..-

......................

a--t

..........................

....................

t .........

t ........

..04)..-

......................

a--t

..........................

..(14)..-

....................................................

..(ll)..-

....................................................

....................................... .....

a ........................

t ........

900

.....................

..............................

t

........

(9)..-

600

I

T

E

A

R

P

D

P

A

A

CagAATCACTGAAGCCAGACC¢GAcC¢CGCAGcC

1200

C. . . . . . . . . . . . . . .

C. . . . . . . . . . . . . . . C. . . . . . . . . . . . . . . .

K A A P A A A/V A A/V P A A A A P O T A S D A A A A A A L T A A N A K/A

:::::::::::::::::::::::::::::::::::: ......................... ....................... ...................

............................

1300

::::::::::::::::::::::::::::::::::::

T ......................................................

C ...............

T ........................................................

GCC-

C ...................

T ................................................................................

2A-7b

A A A E / K L T A A / O N A A A A A A A T A R G STOP CCG~TGCCGAA~TcACTGcCGCCAA~GCCGcCGCCGCCGCAGCAG~cAC~GCCAGAGGTTAAGGATCGTGGTCGTCTTGATGTGGGATCATGTGAACATC

]A-lb 3-3b 4-2b lA-la 3-3a 2A-7¢ 4-2c

.................................................................................................... .................................................................................................... .................................................................................................... .... C---A ............ A ............... T .............................................................. . . . . C- - - A . . . . . . . . . . . . A ............... T .............................................................. .................................................................................................... ....................................................................................................

]400

Fig. 4. Comparison of neighbouring AFP genes from the tandem repeats. The sequences of eight AFP genes are compared from between the CAAT and TATA boxes (bp 501) to just downstream from the stop codon (bp 1400). The numbering scheme and designations are as described for Figs. 1 and 2, aa are aligned with the second nt of each codon. Letters (a,b,c) attached to each gene designation indicate the Sstl polymorphic class. The numbers in parentheses in the intron refer to the total number of dG residues in the oligo(dG) tract. Asterisks denote nt deletions. The sequences listed have been deposited in GenBank under the accession Nos. shown in parentheses: 2A-Tb (M62415); 1A-lb/3-3b (M62414); 4-2b (M62416); IA-la/3-3a (M62413); 2A-7c (M62414) and 4-2c (M62617).

ribosomal RNA gene repeats (Tartoff, 1973) conserved. Cross-over fixation has been implicated in this process (for a review, see Maeda and Smithies, 1986). This mechanism

requires multiple occurrences of unequal homologous recombination. The opportunity for unequal recombination increases as the number of tandemly repeated genes in-

168 TABLE I Pairwise comparison of base changes between AFP genes" IA-lb 3-3b

24 11+1+12

2A-7b 20

2A-7c 22

4-2b

4-2c

23

26

8+0+12

8 + 0 + 14

9+1+13

11+1+14

12

16

12

11

10+1+1

8+1+2

11

12

9+1+1

9+1+2

13

14

9+1+3

9+1+4

11+1+0

11+1+4 4 0+0+4

6

lA-la 3-3a

1A-lb 3-3b

2A-7b

2A-7e

4-2b

4+1+1

I, The total number of differences between genes was scored from the 900 bp of sequence presented in Fig. 4. Length variation in the oligo(dG) tract at nt position 1037 was scored as a single change as was the insertion/deletion of 2 bp at nt position 877. The gene pairs IA-la and IA-Ib were considered to be the same sequences as 3-3a and 3-3b, respectively. The three numbers at the bottom of each box represent the number of changes ahead of the oligo(dG) tract (out of 536 bp), a change in the length of the oligo(dG) tract, and the number of changes after the oligo(dG) tract (out of 350 bp), respectively.

creases. Experiments with null-cutting restriction enzymes indicated that several clusters of five to eight tandem repeats were present in the flounder genome and that at least some of these clusters were closely linked (Scott et al., 1985). Indeed, this mechanism would be consistent with the wide variation in the number of tandem repeats seen between individuals in one population, and individuals from several geographical locations (Hayes et al., 1991), and between closely related righteye flounders (Scott et al., 1988a).

(d) 'Hot-spots' for recombination Three sequences are worth commenting on in this regard. The first is the oligo(dG) tract towards the 3' end of the intron which is variously 9, 11, 14 or 16 bp long. Oligo(dG) tracts have been identified as preferred sites of attack by a recombination-associated nuclease (RuizCarillo and Renaud, 1987). Unequal crossing-over initiated at the oligo(dG) tracts could account for the homopolymer length variation. The second sequence is the 33-bp segment coding for the 1l-aa repeat which, by virtue of its tandem repetition, might enhance the frequency of recombination. Indeed, the occurrence of flounder AFP gene and eDNA variants containing four, and even five, of the 11-aa repeats is highly suggestive of unequal crossing-over in this region of the gene (Lin and Gross, 1981; Gourlie et al., 1984;

Gauthier et al., 1990). It has been pointed out by Maeda and Smithies (1986) that genes coding for proteins with repetitive aa sequences, such as silk fibroin, maize zein and the proline-rich salivary proteins often show length polymorphisms due to differences in the number of repeats. The third sequence is GGXGG, which occurs frequently in the 33-bp repeats and in the 'pro'-region. This sequence is contained within the Chi-site consensus, which is a 'hotspot' for recombination in both prokaryotes and eukaryotes (Jeffreys et al., 1985). It is thought to be responsible for abundant rearrangements in the chorion gene locus where it corresponds to adjacent glycine codons (T. Eickbush, pers. commun.). In the AFP genes this sequence appears most frequently as the complement of two GCC codons, the preferred codon for alanine. This sequence occurs 13 times in the 190 bp 3'-exon of 2A-7b. When nearest-neighbour genes are compared to each other there are typically 9-10 bp changes upstream from the oligo(dG) tract, whereas there are no changes in the 2A-7b vs. 2A-7c gene pair comparison, and only four changes in the 4-2c vs. 4-2b gene pair comparison (Table I). Downstream from the oligo(dG) tract the 'b' and 'c' polymorphisms show few changes (0-4) from each other, but 12-14 changes from the 'a' polymorphism (Table I). This is an indication that regions on either side of the

169 oligo(dG) tract may have undergone gone conversion events as independent units. This is consistent with the oligo(dG) tract serving as an 'initiator sequence' for recombination. (e) Evolution of the tandem repeats The similarity of the AFP genes in tandem repeats and their lack of large insertions or deletions is to be expected given the enhanced opportunities for gone homogenization mechanisms to operate. However, it is not possible to gauge how much their similarity is due to gone conversion and how much to unequal crossing-over. The latter is almost certainly operative based on the observation that a closely related righteye flounder, the yellowtail flounder (Limanda ferruginea), lacks a tandemly amplified set of AFP genes (Scott et al., 1988a). The A F P produced from this species' linked but irregularly spaced gone family contains four 11aa repeats, whereas the expansion of the randomly repeated AFP gone set in winter flounder has led to the dominance of the three 11-aa repeat AFP. A similar example of the recent emergence of a randomly repeated AFP gone set is seen in eel pouts, which are unrelated to flounders and produce a completely different A F P type (Hew et al., 1988; Scott et al., 1988b). Indeed it has been suggested that the rapid expansion of a tandemly repeated AFP gone set could be the mechanism by which these and other teleosts adapted to a sudden change in climate that increased the severity of sea-level glaciation (Scott et al., 1986). (f) Conclusions (1) Six AFP genes from the tandemly repeated set in winter flounder are > 9 7 ~ identical and have no major insertions or deletions of sequence within the gone, in contrast to members of the AFP multigene family outside the tandem repeats. (2) Tandemly repeated AFPgenes code for the two major A F P components and belong to one of three interspersed polymorphisms that are due to length variation in the 3'flanking DNA. (3) It is likely that gone conversion and unequal crossing-over both act to keep the tandemly repeated AFP genes more homogeneous than the irregularly spaced genes outside the repeats. (4) Potential 'hot-spots' for recombination between the tandem repeats are an oligo(dG) tract in the intron, which varies in length from gone to gone, and three 33-bp repeats in the 3'-exon. ACKNOWLEDGEMENTS I would like to thank Sherry Gauthier for technical assistance and Patrick Blader for help in subcloning the A F P genes. This work was supported by the Medical Research Council of Canada.

REFERENCES Davies, P.L. and Hew, C.L.: Biochemistryof fish antifreeze proteins. FASEB 4 (1990) 2460-2468. Davies, P.L. and Gauthier, S.Y.: Antifreezeprotein pseudogenes. Gene 112 (1992) 171-178. Davies, P.L., Roach, A.H. and Hew, C.L.: DNA sequencecoding for an antifreeze protein precursor from winter flounder. Proc. Natl. Acad. Sci. USA 79 (1982) 335-339. Davies, P.L., Hough, C., Scott, G.K., Ng, N., White, B.N. and Hew, C.L.: Antifreezeprotein genes of the winter flounder. J. Biol. Chem. 259 (1984) 9241-9247. Davies, P.L., Hew, C.L. and Fletcher, G.L.: Fish antifreeze proteins: physiology and evolutionarybiology. Can. J. Zooi. 66 (1988) 26112617. DeVries, A.L.: Antifreezepeptides and glycopeptidesin cold-water fishes. Annu. Roy. Physiol.45 (1983) 245-260. Fletcher, G.L., Kao, M.H. and Fourney, R.M.: Antifreezepeptidesconfer freezing resistance to fish. Can. J. Zool. 64 (1986) 1897-1901. Fourney, R.M., Fletcher. G.L. and Hew, C.L.: The effects of long day length on liver antifreeze mRNA in the winter flounder, Pseudopleuronectesamericanus.Can. J. Zool. 62 (1984a) 1456-1460. Foumey, R.M., Joshi, S.B., Kao, M.H. and Hew, C.L.: Heterogeneityof antifreeze polypeptides from the Newfoundland winter flounder, Pseudopleumnectesamericanus.Can. J. Zool. 62 (1984b) 28-33. Gauthier, S., Wu, Y. and Davies, P.L.: Nucleotide sequenceof a variant antifreeze protein gone. Nucleic Acids Res. 18 (1990) 5303. Gourlie, B., Lin, Y., Price, J., DeVries, A.L., Powers, D. and Huang, R.C.C.: Winter flounder antifreeze proteins: a multigene family. J. Biol. Chem. 259 (1984) 14960-14965. Hayes, P.H., Davies, P.L. and Fletcher, G.L.: Population differencesin antifreeze protein gonecopy number and arrangementin winterflounder. Genome 34 (1991) 174-177. Hentschel, C.C. and Bimstiel,M.L.: The organization and expressionof histone gone families.Cell 25 (1981) 301-313. Hew, C.L., Wang, N.C., Joshi, S., Fletcher, G.L, Scott, G.K., Hayes, P.H., Buettner, B. and Davies, P.L.: Multiplegenes providethe basis for :,~ntifreeze protein diversity and dosage in the ocean pout, Macrozoarcesamericanus.J. Biol. Chem. 263 (1988) 12049-12055. Jeffreys, A.J., Wilson, V. and Thein, S.L." Hypervariable'minisatellite' regions in human DNA. Nature 314 (1985) 67-73. Lin, Y. and Gross, .I.K.: Molecularcloningand characterizationof winter flounder antifreeze cDNA. Proc. Natl. Acad. Sci. USA 78 (1981) 2825-2829. Lin, Y. and Long, D.J.: Purificationand characterizationof winter flounder antifreeze peptide messenger ribonucleic acid. Biochemistry 19 (1980) 1111-1116. Maeda, N. and Smithies,O.: The evolutionof muitigenefamilies:human haptogiobin genes. Annu. Roy. Goner. 20 (1986) 81-108. Pickett, M.H., Hew, C.L. and Davies,P.L.: Seasonalvariationin the level of antifreezeprotein mRNA from the winter flounder. Biochim.Biophys. Acta 739 (1983) 97-104. Pickett, M., Scott, G., Davies, P.L., Wang, N., Joshi, S. and Hew, S.L.: Sequence of an antifreeze protein precursor. Eur. J. Biochen~. 143 (1984) 35-~8. Rancourt, D.E., Walker, V.K. and Davies, P.L.: Flounder ant|freeze protein synthesis under heat shock control in transgenic Drosophila melanogaster. Mol. Cell. Biol. 7 (1987) 2188-2195. Ruiz-Carrillo,A. and Renaud,J.: EndonucleaseG: a (dG).. (dC),-specific DNAse from higher eukaryotes. EMBO J. 6 (1987) 401-407. Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencingwith chainterminating inhibitors. Proc. Natl. Acad. Sci. USA 74 (1977) 54635467.

170 Scott, G.K., Hew, C.L. and Davies, P.L.: Antifreeze protein genes are tandemly linked and clustered in the genome of the winter flounder. Proc. Natl. Acad. Sci. USA 82 (1985) 2613-2617. Scott, G.K., Fletcher, G.L. and Davies, P.L.: Fish antifreeze proteins: recent gene evolution. Can. J. Fish. Aquat. Sci. 43 (1986) 1028-1034. Scott, G.K., Davies, P.L., Kao, M.H. and Fletcher, G.L.: Differential amplification of antifreeze protein genes in the Pleuronectinae. J. Mol. Evoi. 27 (1988a) 29-35. Scott, G.K., Hayes, P.H., Fletcher, G.L. and Davies, P.L.: Wolffish antifreeze protein genes are primarily organized as tandem repeats

that each contain two genes in inverted orientation. Mol. Cell. Biol. 8 (1988b) 3670-3675. Slightom, J.L., Blechl, A.E. and Smithies, O.: Human fetal G7 and ^7globin genes: complete nucleotide sequences suggest that DNA can be exchanged between these duplicated genes. Cell 21 (1980) 627638. Tartoff, K.D.: Unequal mitotic sister chromatid exchange and disproportionate replication as mechanisms regulating ribosomal RNA gene redundancy. Cold Spring Harbor Symp. Quant. Biol. 38 (1973)491500.

Conservation of antifreeze protein-encoding genes in tandem repeats.

The antifreeze protein (AFP) multigene family in winter flounder is made up of genes in tandem repeats and others that are linked, but irregularly spa...
810KB Sizes 0 Downloads 0 Views