Plant Molecular Biology 18: 1019-1022, 1992. © 1992 Kluwer Academic Publishers. Printed in Belgium.

1019

Update section

Short communication

A new homeobox-leucine zipper gene from Arabidopsis thaliana Jim Mattsson, Eva S/3derman, Marie Svenson, Chumpol Borkird and Peter Engstr~3m*

Department of Physiological Botany, University of Uppsala, Box-540, S-751 2I Uppsala, Sweden (*author for correspondence) Received 15 October 1991; accepted in revised form 6 January 1992

Key words:Arabidopsis, homeobox, homeodomain, leucine zipper Abstract We have isolated a homeobox-containing gene from Arabidopsis thaliana using a degenerate oligonucleotide probe corresponding to the most conserved region of the homeodomain. This strategy has been used previously to isolate homeobox-containing genes from Caenorhabditis, and recently from A. thaliana. The Arabidopsis genes have an unusual structure in that they have a leucine zipper motif adjacent to the carboxy terminal region of the homeo domain, a feature not found in homeobox-containing genes isolated from animals. We report the isolation and primary structure of a new member of this Arabidopsis homeobox-leucine zipper gene family. This new member has the homeodomain and leucine-zipper motif similar to the two genes previously identified, but differs from these genes in the part corresponding to the carboxy terminus of the polypeptide, as well as in size and isoelectric point of the protein.

The homeobox is a conserved 183 bp long DNA sequence found in the coding sequence of a class of transcription factors in animals and man, as well as in yeast [7]. The secondary structure of the homeodomain suggests that it contains three c~-helices, one of which (the helix 3) recognizes and binds to specific D N A sequences [3, 5]. In Drosophila melanogaster, where embryo development is dependent on the hierarchical and sequential expression of a set of homeobox genes, as well as in other animals, these genes have important roles in development, and in the determination of cell fate [7]. The high degree of conservation of the homeo-

box sequences between different genes was exploited by Btirglin et al. [1] to design a set of degenerate oligonucleotide probes corresponding to the helix 3. These probes were used to isolate putative homeobox-containing genes from Caenorhabditis elegans [1 ], as well as from Planaria [2]. Recently, homeobox-containing genes have been isolated also from a plant, Arabidopsis thaliana, by use of the same strategy [6]. This, together with the demonstration that the Kn-1 gene of maize contains a homeobox [8], constitutes the first documented occurrence of homeobox genes in plants. We have screened a c D N A library constructed

The nucleotide sequence data reported will appear in the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession number X62644.

1020

from poly(A) + RNA from A. thaliana plants [4] using one of the degenerated pools of short (23 bp) oligonucleotides, HB 1 and a protocol previously used to isolate C. elegans homeobox clones [1]. These oligonucleotides match all possible codon combinations from a highly conserved stretch of amino acids in the antennapedia class of homeodomains. Among four positive clones from a first screen of the library we have identified one clone that contains significant sequence similarity to known homeoboxes. The complete nucleotide sequence of this clone is shown in Fig. 1. The ACT

C-CG T T A T T A C C T T G A A A T C C C A A G

clone is 656 basepairs long, and contains a 23 bp sequence which matches the probe sequence in 21 positions (Fig. 2). The 2 bases of mismatch are located in the 3' end of the oligonucleotide. This clone contains one long open reading frame between nucleotides 97 and 550, corresponding to a protein with 151 amino acid residues, a molecular weight of 17 600 and a calculated isoelectric point of 9.2. This amino acid sequence contains a stretch of 61 residues (48-108)which shows a distinct similarity to homeodomains from other proteins

TTA CCA AAA GAA GCA ATA ATT

TGT CTT

TGT TTT GGT GTA GGA GGA GGA GGA AAC

ACG T

GGA G

GTA TCA GAT CAT CTT CAT CTA ACT CAA AAA V S D H L H L T Q K

AAT N

GAT CAA GAC D Q D

CAG GTA GGC Q V G

GGT AAT

TTT T G T

60

TCA ATG S M

TCG TTC S F

120

AGC CCG ACA ACA ACG AAC AAC ATG S P T T T N N M

180

TCA GAC S D

240

TAC ATG ATG AAC AGA M M N R

GAG GAA GAC AAC CTA E E D N L

TTA GGA GAG AAG AAG AAG AGG CTG AAT TTA GAG CAA GTT AGG L G E K K K R L N L E Q V R

GAG CTA GGG AAC AAG E L G N K

CTG GAG CCA GAG AGG AAG ATG L E P E R K M

CAG CCT AGG CAG ATT Q P R Q I

GC.A A T A A I

TGG TTT CAG AAC AGG AGA W F Q N R R

CTC L

GAG AGA GAC E R D

TAT GAT Y D

TCT S

CTT L

CAC AAC AAG AAA CTA CAT GCT H N K K L H A

CTT GCC L A

ACT AAT T N

CGT R

TGA ACA X>

TAG ATT

GAT AAG AAA

TCT CTC AAA AAA CAG S L K K Q

TCG CGT

CAG Q

TAG TTT T T A A G C

GTA V

TCA CAT ATG ATG S H M M

TTC F

300

C T T G C T A A G G C T T T A G G A TTG L A K A L G L

360

GCT AGG A R

CAG Q

420

TCG GAC AAT GAT S D N D

480

CAC ACA AAT H T N

TCT S

540

TTC G A T

600

TTC GAT GTT F D V

GAG E

GAT GGG D G

GCA TTA GAG AAG AGC A L E K S

TGG AAG ACA AAG W K T K

TTG AAA L K

TAT AAT ATA Y N I

GCT AAT CTT ATC GAA ACG ATT

TGT T T T T A C A A A T C A C T A A T T

AAT ATG AAT ATT ATA

TGT A C

Fig. 1. Nucleotide sequence and deduced amino acid sequence of the Arabidopsis clone Athb-3. The homeobox sequence and the corresponding homeodomain are underlined. The leucines and isoleucine in the proposed leucine-zipper motif are marked with asterisks.

1021 F

O

N

Athb-3

GCA

A

ATA

I

T GG

W

TTT

CAG

AAC

AGG

AG

HB-1

"*1 AAA

III ATX

I II T GG

III TTT

III CAA

III AAC

III AGX C

II AG C

G

K

C

I

W

F

R

G

T

Q

N

R

The highest homology was found to Athb-1 and Athb-2. This is not unexpected since the clones are from the same species. In accordance with this we name the clone Athb-3 (Arabidopsis thaliana homeobox). The overall amino acid sequence homology between Athb-3 and Athb-1 is 47 ~o. Within the homeodomain 48 of 61 amino acid residues are identical between Athb-3 and Athb-1 and 31 residues between Athb-3 and Athb-2, for which only a partial sequence has been published. Among representative homeodomains from the animal kingdom the highest homology is found to Antennapedia (Antp) with 23 amino acid residues identical to Athb-3 and hox2.4, deformed (Dfd) and hoxl.5, all with 22 amino acid residues identical to Athb-3 [6, 7]. The homeodomain of the maize Kn-1 is more distantly related to those of the Arabidopsis proteins. It has the highest homology to other homeodomains when the amino acid residue in position 22, 23 and 24 is omitted from the Kn-1 sequence [8]. If this is done in the comparison to Athb-3 the sequences are identical in 17 positions. Among the sequences examined the homeodomains most distantly related to that of Athb-3 were those of prd and mec3 (both with 16 amino acid residues in common to Athb-3). In addition to the homeodomain, the amino acid sequence of Athb-3 also contains a putative

R

R

Fig. 2. The similarity between the HB-1 oligonucleotide probe and the corresponding sequence of the Athb-3 (from the nucleic acid position 376 to 399, Fig. 1). Out of 23 nucleotides of the probe, the Athb-3 has 2 mismatches at the 5' proximal region.

(Fig. 2). It harbours the four amino acid residues known to be invariant in homeodomain proteins, the tryptophane49, the phenylalanine 50, the asparagine 52 and the arginine 54 (Fig. 3). In addition, 7 of 8 other highly conserved amino acid residues were found also in our sequence, namely glutamine 13, leucine 17, phenylalanine 21, leucine 41, isoleucine 46, arginine 56 and lysine 58. At position 6 which is commonly an arginine or a glutamine, our sequence has a lysine. However, lysine does occur in this position also in other homeodomains and is considered a conservative substitution. The relative similarity between the putative homeodomain of our sequence and representatives of different known classes of animal and yeast homeodomains [7] and three known homeodomains from plants [6, 8] is shown in Fig. 3. R

Q

L

F

L

I

WF

N

R

R

K

I

I

I

I

I

II

I

I

I

I

Athb-3

MLGEKKKRLNLEQVRAL

Athb-1

Q - P - - - R - - T T - - - H L DNSR-L-SKD-SA

Athb-2 ~ ~x2.4 ~xl.5

AbdB Labial Oct-1 Kn-1

prd mec3

E K S F E L G N K L E P E R K M Q L A K A L G L Q P R Q I . . . . . . TE . . . . . . . . T . . . . I-E T - K D H S T - N - K Q - Q A - - -

K . . . . . . . Q- - - RA-

TL E - - E- HFNRYTRRRI E I - H- ERKRGRQTYTRY TL E - - E- L FNPYTRKR I EVSH . . . GRRRGRQTYSRY I L E - - E- HYNRYTRRR I E I - HTEPKRQRTAYTRH LVE - - E- H FNRYMRPRVEMN LSSKRGRTAYTRP TL E - - E- L FNAYVSK QKRWE- RNXVRKR- PYSK F L A R K R E N E N R Y T E R R Q S S EDEKRPRTAFSSLTE - - E- HFNRYTRAR I E INTTNNSGRTNFTNK LGR - - E- YK E- YVSRPRCE- AQSVRRYRTAFTRD RV . . . . . . LENQPTSE I TM I - DQRRRKRTSI ETN SKKK- GKPKARQQLSWWYKWPYPSETQVA- ESTK QRRCRTTFSASLD E- - RA- RTQYPD I YTR EE ° " QRTN" XRRGPRTT I KQNLDVNEMSNTPP SKHARAK- LET-

I I 0

I 20

I 30

-

V-V VEV

. . . . . . . . . . . . . . . . . . . . . TL-

C- TE- - K . . . . . . . TE- VK . . . . . . V- SE- - K . . . . . . N- TE- - K . . . . . . Q- TE- VK . . . . . . - - N EA- K . . . . . Q- NETVK . . . . . . N- PESTKV . . . . . NMEK EVRV- C- DLKo - NN- ITEARQV" - S . - SMV" QV . . . . .

I 40

61

AIWFQNRRARWKTKQ

I 50

48 -

. . . . .

MK- K EN MK- KEN MK- KDN MKYKDMKNKNS K- - K I - K ST . MKQKRV . MKDRQR - - QKER I N Q- K- HWKPS . . . . LRK QH . SK ERR LK

31 23 22 22 22 21 20 19 19 18 17

16 16

I 60

Fig. 3. Amino acid sequence comparison of Athb-3 and representatives of other classes ofhomeodomains. The N- and C-terminus of the Athb-3 homeodomain correspond to the nucleic acid position 297 and 480, respectively (Fig. 1). The amino acids conserved among most or all homeodomains are indicated above the Athb-3 sequence. The numbers of amino acids identical between Athb-3 and each of the other homeodomains are shown to the right of the sequences. Dashes indicate positions of amino acid identity between Athb-3 and other sequences.

1022 LEUCINEZIPPERDOMAIN AtHB-3 AtHB-1 AtHB-2

Q I A I WF Q N R R A R W K QVAVWFQNRRARWK QV E V W F Q N R R A R T K

I

50

B

TK QFqERDYD S TK Q~L[ERDYDL L K q.~EVDCE F

I

6O

KDHAVY

K K Q F D v F [ S K S D N D SF'qL A H N K K S T YDC~LIL SN YD S I I I V M D N D K RRCCENLUTE ENRRL~QK E VT E

I

I

7O

8O

90

RSEVT R A L K L SP

I

Fig. 4. Amino acid sequence comparison of Athb-3, Athb-1 and Athb-2 in parts of the sequences containing the C-terminal part of the homeodomain and the leucine zipper motif. Leucines are boxed.

leucine zipper motif with five leucines and one isoleucine occurring in every seventh position between the end of the homeodomain and the carboxy terminus of the protein (Fig. 1). This unusual feature has previously been found only in the Arabidopsis homeobox genes Athb-1 and Athb-2 [6]. The borneo and leucine zipper domains of Athb-3, Athb-1 and Athb-2 align perfectly to each other (Fig. 4). The conservation not only of the leucine zipper motif but also its position in relation to the homeodomain is striking and suggests that the distance between the homeodomain and the leucine zipper motif is critical for proper function of the protein. Like Athb-1, Athb-3 has a sequence close to the Nterminal end of the homeobox which is particularly rich in glutamines. However, the sequence similarity between the proteins towards the Nterminal end is low, and the calculated molecular weights, 31000 and 17 600, respectively, and isoelectric points, 4.6 and 9.2, respectively, of the proteins differ markedly. It is evident that the three Arabidopsis genes identified constitute a new class of homeoboxleucine zipper containing genes. Interestingly, this class of genes has been found only in Arabidopsis but not in other organisms even though over a hundred homeobox containing sequences have been identified. Whether this is a unique feature of plant homeobox containing genes remains to be elucidated.

Acknowledgements We thank Tapio Palva and co-workers at the Swedish Agricultural University, Uppsala, Swe-

den for providing us with the Arabidopsis cDNA library, and Thomas Btirglin, Harvard Medical School, Massachusetts, USA, for the gift of the HB-1 probe and a protocol for screening. This work was supported by grants from the Faculty of Sciences, Uppsala University and the Swedish Council for Forestry and Agricultural Research (SJFR). J.M. and E.S. contributed equally to this work.

References 1. Btirglin

TR,

Finney M,

Coulson

A,

Ruvkun

G:

Caenorhabditis elegans has scores of homeobox-containing genes. Nature 341:239-243 (1989). 2. Garcia-Fernandez J, Baguna J, Salo E: Planarian homeobox genes: cloning, sequence analysis, and expression. Proc Natl Acad Sci USA 88:7338-7342 (1991). 3. Kissinger CR, Beishan Liu, Martin-Blanco E, Kornberg TB, Pabo CO: Crystal structure of an engrailed homeodomaln-DNA complex at 2.8 A resolution: a framework for understanding homeodomain-DNA interactions. Cell 63:579-590 (1990). 4. Nordin K, Heino P, Palva ET: Separate signal pathways regulate the expression of a low-temperature-induced gene in Arabidopsis thaliana (L.) Heynh. Plant Mol Biol 16: 1061-1071 (1991). 5. Otting G, Quian YQ, Billeter M, MOller M, Affolter M, Gehring WJ, Wfithrich K: Protein-DNA contacts in the structure of a homeodomain-DNA complex determined by nuclear magnetic resonance spectroscopy in solution. EMBO J 9:3085-3092 (1990). 6. Ruberti I, Sessa G, Lucchetti S, Morelli G: A novel class of plant proteins containing a homeodomain with a closely linked leucine zipper motif. EMBO J 10: 1787-1791 (1991). 7. Scott MP, Tamkun JW, Hartzell III GW: The structure and function of the homeodomain. BBA review on cancer 9:25-48 (1989). 8. Vollbrecht E, Veit B, Sinha N, Hake S: The developmental gene Knotted-1 is a member of a maize homeobox gene family. Nature 350:241-243 (1991).

A new homeobox-leucine zipper gene from Arabidopsis thaliana.

We have isolated a homeobox-containing gene from Arabidopsis thaliana using a degenerate oligonucleotide probe corresponding to the most conserved reg...
291KB Sizes 0 Downloads 0 Views