Comp. Biochem. Physiol.Vol. 1038, No. 4, pp. 759--773, 1992 Printed in Great Britain

0305-0491/92 $5.00 + 0.00

Pergamon Press Ltd

REVIEW ORGANIZATION

OF NON-VERTEBRATE

GLOBIN

GENES

SERGE N. VINOORADOV,* DANIEL A. WALZtand BILL POHAJDAK~ Departments of *Biochemistry and ?Physiology, Wayne State University School of Medicine, Detroit, MI 48201, U.S.A. (Tel. 513 577-1511); and :~Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4Jl (Received 15 April 1992) Atetract--The organization of non-vertebrate globin genes exhibits substantially more variabilitythan the three-exon, two-intron structure of the vertebrate globin genes. (1) The structures of genes of the single-domain globin chains of the annelid Lumbricus and the mollusc Anadara, and the globin gene coding for the two-domain chains of the clam Barbatia, are similar to the vertebrate plan. (2) Genes for single-domain chains exist in bacteria and protozoa. Although the globin gene is highly expressed in the bacterium Vitreoscilla, the putative globin gene hmp in E. coil, which codes for a chimeric protein whose N-terminal moiety of 139 residues contains 67 residues identical to the Vitreoscilla globin, may be either unexpressed or expressed at very low levels, despite the presence of normal regulatory sequences. The DNA sequence of the globin gene of the protozoan Paramecium, determined recently by Yamauchi and collaborators, appears to consist of two exons separated by a short intron. (3) Among the lower eukaryotes, the yeasts Saccharomyces and Candida have chimeric proteins consisting of N-terminal globin and C-terminal flavoprotein moieties of about the same size. The structure of the geue for the chimeric protein of Saccharomyces exhibits no introns. According to Rigge, the presence of chimeric proteins in E. coil and other prokaryotes, such as Alcaligenes and Rhizobium, as well as in yeasts, suggests a previously unrecognized evolutionary pathway for hemoglobin, namely that of a multipurpose heinebinding domain attached to a variety of unrelated proteins with diverse functions. (4) The published globin gene sequences of the insect larva Chironomus have an intron-less structure and are present as clusters of multiple copies; the expression of the globin genes is tissue and developmental stage-specific. Furthermore, the expression of many of these genes has not yet been demonstrated despite the presence of apparently normal regulatory sequences in the two flanking regions. Unexpectedly, Bergtrom and collaborators have recently shown that at least three Ctt globin II/~ genes contain putative introns. (5) Pohajdak and collaborators have found a seven-exon and six-intron structure for the globin gene of the nematode Pseudoterranova which codes for a two-domain globin chain. Although the second and fourth introns of the N-terminal domain correspond to the two introns found in vertebrate globin genes, the position of the third intron is close to that of the central intron in plant hemoglobins. (6) A four-exon, three-intron structure appears to be general for the genes of plant globins. The latter include the leghemoglobins, globins occurring in legumes (e.g. the four Ib loci of the soybean Glycine max), in the highly differentiated structures (nodules) developed in the presence of symbionts (the proteobacteria Rhizobium, Bradyrhizobium and Azorhizobium and the actinomycete Frankia) and the globins present in symbiont-containing non-legnminous plants and in symbiont-free plants. Although the leghemoglobin genes occur in multiple copies at two separate loci, some plants have a single globin gene locus. The lb loci are developmentally expressed at a high level and are regulated in a tissue-specific manner upon induction by the bacterial symbiont. The widespread, if sporadic, occurrence of globin-like genes in diverse groups of living organisms suggests that globin-like genes may be ubiquitous and thus likely to have descended from a single, ancient, globin-like geue coding for a monomeric, single-chain, single-domain globin, which existed prior to the time of divergence of prokaryotes and eukaryotes.

of vertebrate hemoglobin and myoglobin structures characterized by the persistence of the "myoglobin fold" (Lesk and Chothia, 1980; Bashford et al., 1987). Non-vertebrate hemoglobins and myoglobins are present episodically in most, but not all, phyla (Keilin, 1956) and exhibit a much greater variability in the amino acid sequences of their constituent chains than their vertebrate counterparts. In contrast to the latter, many non-vertebrate hemoglobins also have a propensity towards aggregation into large structures (Vinogradov, 1985). However, despite their differences, the vertebrate and invertebrate globins, including the plant globins, appear to be clearly related (Zharkikh

INTRODUCTION Vertebrate globin genes represent one of the most extensively studied gene families. The structure of all alpha-like and beta-like globin genes exhibits a remarkable evolutionary stability throughout the vertebrate phylum. Based on parsimonious alignments of these genes, two introns interrupt the coding region at precisely homologous positions (Hardison, 1991). Vertebrate myoglobin genes also exhibit this configuration (Blanchetot et ai., 1983, 1986; Weller et al., 1984)! The evolutionary stability of the globin gene structure is mirrored in the uniformity 759

760

SERGE N. VINOGRAOOVet al.

et al., 1984; Goodman et al., 1988). The accumulation

of over 100 amino acid sequences of non-vertebrate globins has clearly revealed the span of their variability, ranging from multiple, single-chain globins containing one heme-binding domain, including truncated single chains, to chimeric globin chains containing two to nine heme-binding domains, including truncated two-domain chains (Vinogradov et al., 1992). Concomitant determination of the gene structures of some of the non-vertebrate globins has also provided evidence for similarities as well as for differences between them and vertebrate globin gene structures. We describe briefly, in this review, the present status of our knowledge of the organization of non-vertebrate globin genes. STRUCTURE OF SINGLE-DOMAIN GLOBIN GENES

(a) Extracellular globin o f Lumbricus terrestris The extracellular, hexagonal bilayer hemoglobin of the common North American earthworm Lumbricus terrestris (Mr ~ 3600 kDa) consists of four different globin chains and three linker polypeptide chains; the quaternary structure consists of 12 dodecamers (each dodecamer consisting of three copies of the four globins) held together by the linker chains, accounting for two-thirds and one-third of the total mass, respectively (Vinogradov et al., 1991). Riggs and his collaborators have determined the complete nucleotide sequence of the gene of one of the four globin chains, whose amino acid sequence had been determined earlier (Fushitani et al., 1988). A fragment of 4037 base pairs (bp) consisted of 310 bp of a 5' flanking sequence and 110 bp of a 3' flanking sequence, with the coding sequence split by two introns of 1344 and 1169 bp (Jhiang and Riggs, 1989). Although the locations of the introns are identical to those observed for mammalian globin genes (Jhiang et al., 1988) the 5' splice junction of the first intron is GC instead of GT. Characteristic conserved branch-point sequences (Keller and Noon, 1984) CTCAC and CTAAT are found at 50 and 35 bp upstream from the 3' splice junctions of introns I and II, respectively (Jhiang and Riggs, 1989). In addition, a large polypyrimidine tract immediately upstream from the 3' splice site, present in mammalian introns, is observed in intron II but not intron I. Both introns contain many repeat sequences similar to those observed in the intergenic region of the rabbit globin gene cluster (Chent et al., 1986; Margot et al., 1989). Furthermore, several DNA sequences in the two introns of the Lumbricus gene exhibit identity or high homology with intergenic sequences of unrelated proteins. Since most of the simple intergenic repeats in the giobin gene cluster of the rabbit (Cheng et al., 1986; Margot et al., 1989) may have resulted from transposonbased insertions (Weiner et al., 1986) it is likely that a similar mechanism is responsible for the foregoing extensive similarities (Jhiang and Riggs, 1989). In addition, Suzuki and Riggs and their collaborators have obtained the eDNA sequence of one of the three linker chains of Lumbricus terrestris hemoglobin and the complete structure of the corresponding gene (Suzuki et aL, 1992). The chain of 225 residues (Mr -- 25,845) contains five cysteines at inter-

vals of six to seven residues, conserved in the amino acid sequences of the linker chains of the annelid Tylorrhynchus and the vestimentiferan Lamellibrachia (Suzuki et al., 1990a,b). The gene for this linker has three exons and two introns. However, the positions of the splice junctions for these introns do not correspond to those of the gene for heme-binding chain c (Jhiang et al., 1988; Jhiang and Riggs, 1989). The presence of several conserved cysteine residues, together with the gene structure, suggests that the linker chains of the annelid hexagonal bilayer hemoglobins are not related to globins (Suzuki et al., 1992). (b) Structure o f the gene o f an intracellular globin o f Anadara Blood clams of the Anadara and Scapharca sp. have four intracellular globin chains which aggregate into heterotetramers and homodimers. The amino acid sequences of the four chains of Anadara trapezii have been determined by Thompson and his collaborators (Como and Thompson, 1980; Fisher et al., 1984; Gilbert and Thompson 1985; Mann et aL, 1986). Recently, this group has isolated the gene of an unexpressed globin chain from a genomic library of Anadara trapezia and determined its nucleotide sequence (Titchen et al., 1991). The sequence determined was 3775 bp in length with two introns of 1214 and 1435 bp. Although the amino acid sequence deduced from the coding sequence appears to be an unexpressed variant of the ~ chain (Gilbert and Thompson, 1985), and 5' and 3' flanking regions of the genomic sequence do not contain any obvious impediments to expression; a possible cap site for the mRNA is present at 64 nt from the initiation codon, the consensus sequences for the intron/exon boundaries are present and there is a polyadenylation signal 179 nt from the termination codon. Whatever variation is observed in the flanking regions does not appear to exceed the heretofore observed interspecies variation in vertebrate globin genes. Thus, an ATrich region 14-29 bp upstream from the possible cap site is preceded by CAAT box regions 50-80 nt further upstream and by a GC-rich region at ,,-250 bp (instead of normal 80-100nt) upstream (Titchen et al., 1991). It should be noted that the variation in the type and placement of a CCAAT box and an ATA box and a ~ 200 bp displacement of the GCrich sequence in the human 0 globin gene does not block its expression (Leung et al., 1987). Furthermore, in the Anadara sequence there is a CA polyadenylation addition at 22 bp 3' to the polyadenylation signal, followed by a run of TGT and T nucleotides, characteristic of many genes whose RNA's are polyadenylated (Bernstiel et al., 1985). (c) Bacterial globin genes The amino acid sequence of a hemoglobin present in the Gram-negative bacterium Vitreoscilla, a member of the Beggiatoa family, has been determined recently by Wakabayashi et al. (1986): it has about 25% homology with lupin leghemoglobin. The gene (vgb) for this globin has been cloned in E. coli, where it is strongly expressed (Dikshit and Webster, 1988; Khosla and Bailey, 1988a). The hemoglobin exists in

Organization of non-vertebrate globin genes the cytoplasm and in the periplasmic space (Khosla and Bailey, 1989) as the physiologically active oxy form during aerobic respiration and can also support the aerobic growth of E. coil lacking terminal oxidase (Dikshit et al., 1991). E. coli strains carrying the vgb gene grow better under microaerobic conditions (Khosla and Bailey, 1988b; Dikshit et al., 1990). Alignment of the untranslated 3' end sequences with the corresponding regions of several leghemoglobin sequences which are known to have conserved sequence motifs (Kuhse and Puehler, 1987), indicate a homology of 45%, only slightly less than the homology among the leghemoglobin sequences CKhosla and Bailey, 1988). (d) Protozoan globin gene Recently, Yamauchi and his collaborators have determined the cDNA sequence of the truncated (116 residues) hemoglobin of the protozoan Paramecium (Iwaasa et al., 1989; Yamauchi et al., 1992). The structure of the corresponding gene appears to be unique, in that two exons are separated by a short intron, shorter than any previously reported nuclear protein-coding gene (Yamauchi, 1992). THE GLOBIN GENES OF THE INSECT CHIRONOMUS

(a) The 12 hemoglobins o f Chironomus thummi The family Chironomidae (midges) of the order Diptera (class Pterygota, sub-phylum Hexapoda, phylum Uniramia; Barnes et al., 1988) is one of the largest families of insects with ~ 10,000 species. They occur worldwide in all types of inland waters and their life-cycle consists of egg, four feeding wingless larval instars, one non-feeding wingless pupal instar, and the winged reproductive adult (Chapman, 1982). Chironomus thummi larvae feed on benthic algae and detritus and thrive in polluted and poorly oxygenated water, where they can account for up to 80% of all animals (Osmulski and Leyko, 1986). Hemoglobin synthesis in Chironomus thummi begins during the second larval instar (although published data does not exclude an earlier onset) and hemoglobin eventually constitutes up to 90% of the hemolymph proteins during the fourth larval instar (English, 1969; Laufer and Poluhowich, 1971; Laufer et al., 1982). At least 12 distinct hemoglobins of 136-151 amino acid residues are synthesized in the rough endoplasmic reticulum of fat body cells of both larvae and pupae (Bergtrom et al., 1976: Bergtrom and Robinson, 1977; Vafopoulou-Mandalos and Laufer, 1984a,b; Myers et al., 1986). Furthermore, hemoglobins are also found in pupal ovaries, freshly laid eggs and developing embryos (Trewitt and Bergtrom, 1982; Trewitt et al., 1986); in the latter case, they may serve as a nutrient source during embryonic development. Interestingly, the Ctt and other chironomid hemoglobins are highly potent human allergens (Baur et al., 1983, 1986). The amino acid sequences of the 12 Chironomus thummi thummi (Ctt) hemoglobins have been determined by Branntizer's group (Goodman et al., 1983) (five are monomeric--I, IA, III, IIIA, IV, and six are homodimeric--II~, VI, VIIA, VIIB, VIII, IX; X can be either monomeric or dimeric). Their oxygen bind-

761

ing is non-cooperative and their oxygen alfmities are high, Ps0-- 0.12-0.6torr (Weber et al., 1985). Co) Many more globin genes

Niessing and his collaborators were the first to investigate the Ctt globin gene structure. They determined the nucleotide sequences of 10 globin genes present in two distinct recombinant phages isolated from a Ctt genomic library, CttG-1 (A-E) and CttG-3 (A'-E') (Antoine and Niessing, 1984; Antoine et al., 1987). These studies revealed the following. (1) All 10 globin genes have no introns; this loss of all introns may have occurred by integration into the insect genome of cDNA generated by reverse transcriptase. (2) The transcriptional orientation of the genes is non-uniform, in contrast to vertebrate globin genes within a cluster, which have the same transcriptional orientation and are arranged in the order of their developmental expression (Erbil and Niessing, 1984). (3) The borders of globin gene D (CttG-1), determined by S1 nuclease mapping, consist of the capping site at - 4 6 nt to the AUG start codon and the poly(A) addition site located at the 81 nt after the stop codon UAA (21 nt after the T in the conserved AATAAAbox). (4) There is substantial homology among all ten 5' flanking regions within the region encompassed by nt - 1 to nt -44, immediately upstream from the transcriptional start-point. The heptanucleotide GTATAAA (nt - 2 7 to -33) which includes the TATA-box (Breatnach and Chambon, 1981) is invariant in the 5' flanking regions of all the Ctt globin genes. No obvious homologies appear further upstream, from nt -51. (5) The 3' ends of the Ctt globin genes are positioned close to 21 nt on the 3' side of the T residue of the conserved AATAAA-box (Montell et al., 1983). (6) The Y-region of all the genes encodes an N-terminal extension of 14aa, which is not present in the known mature Ctt globins (Goodman et al., 1983), typical of secretory proteins (Kreil, 1981). Like all prokaryotic and most eukaryotic signal sequences, the Ctt sequences carry a Lys at the N-terminal end (Boyd and Beckwith, 1990). Except for the Pbe-13-Leu exchange in gene D/W and the Leu-12-Ile and Phe-6-Val exchanges in gene E/E', the sequence and length of the hydrophobic signal peptides is strictly conserved. (7) Globin genes A, A', B and W encode giobin III (CTT III; Buse et al., 1979). Globin genes C, C', D and D' code for globin IV (CTT IV; Pfletschinger et al., 1980), while giobin genes E and E' code for an unreported globin which has Val at position 42, where Ala is present in all 12 chemically determined Cn sequences (Goodman et al., 1983). In addition to the landmark results obtained by Niessing's group, the work of Bergtrom and of Schmidt and their collaborators have demonstrated that the Chironomus species are likely to have much more than a dozen globin genes. Using a cDNA probe, Bergtrom's group have isolated a genomic clone and sequenced four C n genes, also demonstrating the transcription of at least three of the genes by S1 nuelease protection analysis (Saffarini et al., 1985; Trewitt et al., 1988). Although the structures of the four genes and the cDNA probe are closely related, the inferred amino acid sequences of each are not identical to the chemically determined Ctt globin

762

SERGEN. VINOGRADOVet al.

VIIB sequences, to which they are most closely related. It should be noted that the published amino acid sequence (Sladi6-Simi6 et al., 1979) suggests the existence of two similar, but not identical, components. Schmidt's group have also isolated a cluster of several globin genes of the dimeric VIIB gene subfamily from the genomic DNA library of Chironomus thummi piger (Ctp) using a heterologous Ctt gene IV probe and localized it to position F2b2 in chromosome II (Hankeln et al., 1988). Amino acid sequences inferred from the nucleotide sequences of the first two genes in the clone to be reported, belong to the dimeric VIIB subfamily and a third gene was found to code for an unreported globin (Rozynek et al., 1989). Further studies of the Ctp globin gene cluster revealed a total of nine different genes within 11 kb of DNA and the complete nucleotide sequence of the clone revealed the same features observed for other Ctt globin genes. Comparison of the inferred amino acid sequences of Ctp globins with the 12 variants of Ctt (Goodman et al., 1983), showed that three more of the Ctp globins on this clone are also closely related, but not identical, to Ctt HbVIIB. Four additional globins were found, Ctp HbY, HbW (reported earlier by Rozynek et al., 1989), HbV and HbZ for which corresponding proteins have not yet been reported (Hankeln et aL, 1991). Rozynek et al. (1991) have sequenced a recombinant clone from a genomic Chironomus tentans DNA library, which hybridized with a probe containing the Ctp HbVIIB-9 gene. Of the five open reading frames (ORF) present, A, C, D and E had all the essential regulatory sequences, indicating them to be putative globin genes, coding for globins related to the dimeric Ctt globins. ORF B does not have a start codon and is truncated at the 3' end, suggested that it is a pseudogene. Because the inferred globins found in these studies do not correspond exactly to any of the reported Ctt globin sequences and since there are at least five slightly different genes for only one protein variant HbVIIB in both C. thummi thummi nd C. thumi piger, it is possible that there may be over 40 C. thummi globin genes (Hankelm et al., 1991; Rozynek et al., 1991). The existence of so many globin gene copies is supported by recently determined sequences of several other Ctt variant globin genes (Kao and Bergtrom, 1992). (c) Not all Chironomus genes are intronless Recently, Kao and Bergtrom (1992) have obtained the nucleotide sequences of 3 Ctt HblI/~ genes. They found a putative 64 bp intron with suitable donor and acceptor sites, which interrupt each II/~ gene at the same position, but not at one of the absolutely conserved positions of vertebrate globin genes. It should be noted that, based on amino acid sequence comparisons (Goodman et al., 1983), Ctt HblI/~ is not the most ancient Ctt globin. Thus, among the Chironomus globin genes, acquisition and location of an intron in the HblI/~ gene appears to be a later evolutionary event (Kao and Bergtrom, 1992). (d) Chromosomal location o f globin genes Two Hb gene clusters cloned from the genomic DNA of Chironomus thummi were localized in the polytene chromosomes of 13 Chironomus species

(Schmidt et al., 1988). The Hb gene cluster containing the genes for the Ctt monomeric Hbs III and IV and unknown globin E (Antoine et aL, 1987) hybridized in all species to the end of chromosome arm E (region A lb2 of Ctt chromosome III) and the gene cluster containing the genes for the dimeric Hb VIIB proteins could be localized to chromosome arm D (region F2b3 of Ctt chromosome II) (Schmidt et al., 1988). (e) Differential regulation o f Chironomus genes Hemoglobin synthesis in Chironomus thummi is differentially regulated: fourth instar-speeific HblII accumulates more rapidly than Hb present in both the third and fourth instar (Laufer et al., 1976). Roles for the juvenile hormone and for 20-hydroxyecdysone (20HE) in the selective induction and general repression, respectively, of hemoglobin synthesis, were suggested by experiments in which larvae or cultured larval tissues were exposed to insect hormones (Vafopoulou-Mandalos and Laufer, 1982; 1984a,b). Bergtrom's group have used SI nuclease protection assay to measure the alterations in the steady-state levels of six Ctt globin mRNAs during the development of the fourth instar (Saffarini et al., 1991). In addition to low, post-molt levels of each mRNA, supporting repression by 20HE, two distinct patterns of change were observed: transcripts of GblII and GbVI which do not occur in the third instar reached high levels by Day 2 of the fourth-instar and remained high, while transcripts of GbI, GblV, GbVIIB-4 and GbVIIB-5 which are present in the third instar, reached a peak on Days 2-3 of the fourth instar and declined thereafter to basal levels. From these results, at least two of the closely linked globin genes (GblII and GblV, coding for monomeric hemoglobins) are not coordinately expressed during development. Recently, a new protein was found to occur only in the hemolymph of female Chironomus, appearing at the fourth instar and persisting through subsequent stages. This protein consists of two subunits of equal size, a hemoglobin and a vitellogenin (Fabb et al., 1990).

GENES OF CHIMERIC PROTEINS COMPRISING A HEME-BINDING DOMAIN

Several chimeric, two-domain proteins containing a globin-like, heme-binding N-terminal moiety and a C-terminal protein portion of a similar size, have recently been demonstrated to exist in bacteria such as E. coli (Vasudevan et al., 1991), Alcaligenes eutrophus (Weihs et al., 1989) and Rhizobium meliioti (GiUesGonzalez et al., 1991) and in the yeasts Candida (Oshino et al., 1973; Iwaasa et aL, 1991) and Saceharomyces (Zhu and Riggs, 1992). Riggs has suggested that these chimeric molecules represent a hitherto unrecognised evolutionary pathway for hemoglobin: that of a multi-purpose globin or heine-binding domain, attached to a variety of unrelated proteins with diverse functions: dihydropteridine reductase in the case of E. coil (Vasudevan et aL, 1991), a kinase in Rhizobium (David et al., 1988; Gilles-Gonzalez et al., 1991) and a flavoprotein reductase in Alcaligenes (Weihs et aL, 1989), Saccharomyces and Candida (Riggs, 1992).

Organization of non-vertebrate globin genes (a) E. coil globin-like gene In the course of identification of genes encoding dihydropteridine reductase activity in E. coli, a chromosomal DNA fragrant was found to direct synthesis of two chains with Mrs of 44 and 46 kDa (Vasudevan et al., 1991). The nucleotide sequence of the gone coding for the smaller of the two chains (hmp) was determined and the deduced amino acid sequence of the N-terminal 139 residues was found to have 46% identity with Vitreoscilla hemoglobin. The high homology of the promoter region of hmp to the consensus E. coli promoter sequences (Reznikoff and McClure, 1986) and the complementarity of the ribosome-binding site to the 3' end of 16S rRNA (Stormo, 1986) are consistent with the observed high level of expression of the globin in strains carrying multicopy hmp + plasmids (Vasudevan et al., 1991). (b) Yeast giobin-like genes Zhu and Riggs (1992) have obtained the sequence of the chimeric hemoglobin of Saccharomyces cerevisiae from the nucleotide sequences of cDNA and genomic DNA. It consists of 399 residues, and the N-terminai globin-like region and the flavin-binding C-terminal domain have 44 and 36% identity, respectively, with the corresponding regions of the E. coil hmp protein (Vasudevan et al., 1991). The nucleotide sequence of a 1980 bp genomic DNA fragment indicated the absence of introns and the presence of TATA and CCAAT boxes and two possible GATA sites in the 5' flanking region. Since it is known that hypoxiainduced globin synthesis in mammals and birds is regulated by an erythroid-specific enhancer (GATA box) and its binding protein (GATA-1), the presence of two putative GATA sites, together with the recent observation of a GATA-l-like protein in yeast (Cunningham and Cooper, 1991), raises the possibility that synthesis of yeast and vertebrate globins may involve related control mechanisms (Zhu and Riggs, 1992). The homology between the yeast and bacterial chimeric proteins according to Riggs, indicates that such two-domain proteins may have been conserved for as long as 1800 Myrs, a recent estimate of the time of divergence of prokaryotes and eukaryotes (Doolittle et al., 1989) and argues against any horizontal transmission of globin genes (Riggs, 1992). GENE STRUCTURE OF TWO-DOMAIN GLOBINS

(a) Nematode globin genes Recently, Pohajdak and his collaborators have obtained the complete cDNA sequence of a twodomain hemoglobin from the nematode Pseudoterranova decipiens which parasitizes cod and seal (Dixon et al., 1991). The polypeptide chain of this extracellular hemoglobin consists of 133 amino acids, has an 18 amino acid hydrophobic leader sequence and has the potential to bind two molecules of home. The quaternary structure of the hemoglobin is probably similar to that of the closely related Ascaris suum, which is known to be an octamer with a Mr ~ 330 kDa (Darawshe et al., 1987; Darawshe and Daniel, 1991). Furthermore, they have also isolated and determined the complete sequence of a genomic copy of the gone

763

for this hemoglobin (Dixon etal., 1992). (1) This gone consists of seven exons and six introns. The introns vary in size from 164 to 335 base pairs and all exon/intron boundaries have the expected splice sites. (2) The first intron occurs at position 21 of the unprocessed protein, indicating that the nematode hemoglobin gone organization is similar to that of other eucaryotic extracellular protein coding genes in that it contains an intron following its secretory leader sequence, in contrast to the genes coding for invertebrate hemoglobins of Lumbricas (Jhiang etal., 1988), Anadara (Titchen et al., 1991) and Barbatia (Naito et al., 1991). (3) Intron 3 starts at amino acid residue 65, close to the position of the central intron of plant hemoglobin genes (Jonson e t a l . , 1981; Landsmann et al., 1986; Bogusz etal., 1988). Since the position of introns can vary within mammalian globin genes (Hardison, 1991), it was suggested that the nematode hemoglobin gone contains an intron in a similar position to that found previously only in plants (Dixon et al., 1992). The presence of this intron indicates that this nematode gone, as well as plant hemoglobin genes, have retained the chromosomal structure of an ancestral hemoglobin gone. (4) The amino acid sequence of the N-terminal domain ends after exon 5 and the protein sequence repeats at the start of exon 6. The coding sequence for the C-terminal domain contains only one intron at amino acid position 181, corresponding to the same position as intron 2. The deletion of the central intron ("plant" intron) and of the subsequent intron may have occurred as the result of an unequal cross-over event with a reverse transcribed copy of a partially spliced mRNA (Dixon et ai., 1992). Such a loss of introns has been observed in many genes, including globins (Nishioka et al., 1980; Vanin etal., 1980) and often results in processed pseudogenes which are usually incorporated into other regions of the chromosome (Vanin, 1984). A model for the head-tohead duplication of the nematode giobin gone was postulated on similar coding regions in the N- and C-terminai ends of the ancestral gone (Dixon et aL, 1992). The cDNA sequence of a putative single chain globin of the parasitic nematode Trichostrongylus colubriformis has been recently determined by Frenkel et aL (1992) as part of an effort to identify and characterize the host-protective antigens present in the excretory and secretory products of the parasite. The deduced sequence has a 29% identity with the two domains of Pseudoterranova hemoglobin and ~20% identity with Chironomus HbII~ and with human aipha-giobin. A recent report on the progress of the determination of the genome structure of Caenorhab. ditis elegans by Sulston et al. (1992) mentions a putative giobin gone which codes for a globin-like protein related to the Trichostrongylus host-protective antigen. There is no report of the level of expression, if any, of this gone in Caenorhabditis. (b) Gone of the intraeellular hemoglobin of the clam Barbatia The erythrocytes of the clam Barbatia reeveana contain three hemoglobins, two single-chain and a polymer of two-domain subunits; the cDNA-derived amino acid sequences of the two single-domain chains

764

SERGE N. VINOGRADOVet al.

(Riggs and Riggs, 1990) show that they are only distantly related to the two-domain chain (Riggs et al., 1986). The latter consists of 308 residues and two domains with 78% identity connected by two Lys residues. Riggs and his collaborators have determined the sequence of 134,370bp of the genomic DNA encoding the two-domain globin (Naito et al., 1991, 1992). In addition to the four normal introns, two in each domain, two novel introns were found: one, a "precoding" intron is located two bases 5' from the start codon and the other, a "bridge" intron, separates the DNA sequences encoding the two domains. A stretch of 64 nucleotides at the 5' end of the bridge intron is homologous to the sequence in the 3' non-coding region beyond the termination codon and the intron also contains a characteristic polyadenylation signal sequence whose flanking regions are also very similar to the eDNA sequences encoding globin chains intron bases 256-282). The results suggest the 5' part of the bridge intron is derived from the 3' non-coding region of the gene for an ancestral single-domain chain and the 3' part of the bridge intron is derived from the corresponding 3' part of the precoding intron of the gene for the ancestral single-domain globin (Naito et al., 1991). PLANT GLOBIN GENES: LEGUME AND NON-LEGUME

(a) Symbiotic and non-symbiotic nitrogen fixation Although nitrogen fixation, the reduction of atmospheric dinitrogen to ammonia, does not occur in eukaryotes, it occurs widely in eubacteria and archaeobacteria (Young, 1992). Apart from freeliving nitrogen-fixing bacteria, such as Klebsiella pneumoniae, many members of the eubacteria form two types of symbiosis with plants. These are associative symbiosis and symbiosis concomitant with the formation of differentiated nitrogen-fixing structures, namely nodules in the roots or stems of the plant host. The latter type of symbiosis occurs among the actinomycetes (Thallobacteria) and the proteobacteria. Actinomycetes in the single genus Frankia use the actinorhizal plants as hosts. The latter belong to 24 genera within eight families of woody angiosperms; the most widely distributed are Alnus, Eleaeagnus and Casuarina (Baker and Mullin, 1992). The proteobacteria participating in legume rootnodule symbiosis were originally placed in the genus Rhizobium; however, it has become evident that further subdivision is necessary (Young, 1992). The genuses Bradyrhizobium and Azorhizobium each consist of only one named species, B. japonicum, which nodulates soybean ( Glycine max) and A. caulinodans which nodulates the stems of Sesbania. Several species of Rhizobium are recognized. The symbiotic functions of Rhizobium leguminosarum are plasmid-encoded and the same chromosomal strain can nodulate legumes in any one of three groups, depending on the hostspecificity type or biovar. For biovar trifolii, the hosts are clovers (Trifolium), for biovar phaseoli, the hosts are beans (Phaseolus) and for biovar viciae, the hosts are peas (Pisum), vetches (Vicia) and lentils (Lens) (Young, 1985; Pinero et al., 1988). Alfalfa (Medicago sativa) and related legumes are nodulated by R. meliloti; most of the isolates appear to fall into

two distinct genetic groups (Eardly et al., 1990). It is likely that the taxonomy of the foregoing symbionts will continue to change. (b) Genes involved in symbiosis: leghemoglobins are late nodulins In the root nodules the symbiont differentiates into a bacteroid form, capable of synthesizing the nitrogenase enzyme necessary for atmospheric nitrogen reduction and thus able to provide the plant with a supply of inorganic nitrogen as ammonia (Long, 1989). The three main groups of symbiont genes are the nif, nod and fix genes: they are defined by the phenotype of the symbiosis (Barbour et al., 1992). The mf genes code for the nitrogenase enzyme complex and the nod and fix mutants (Nod- and Fix- ) are characterized by absent or poorly-forming nodules and by nodules incapable of nitrogen fixation, respectively. Furthermore, a number of plant proteins, called nodulins (Legocki and Verma, 1980; van Kammen, 1984; Govers et al., 1987), are produced in the root nodule. They differ in their temporal expression, the early or class I nodulins being involved in the infection process or in early nodule development and the late or class II nodulins being involved in the functioning of the nodule (Fuller and Verma, 1984; Verma et al., 1986; Verma and Delauney, 1988). Leghemoglobins, first observed in 1939 (Kubo, 1939), belong to the latter group, detected at the onset of nitrogen fixation. These single-chain hemoglobins have a high oxygen-affinity, are localized in the cytoplasm (Robertson et al., 1984) and serve to protect the oxygen-sensitive bacterial nitrogenase as well as to provide a supply of oxygen necessary for most other aerobic functions (Appleby, 1984). (c) Legume globins are encoded by multiple genes (i) Soybean (Giycine max) leghemoglobin genes. The hemoglobins of legume nodules are encoded by the plant genome (Bauicombe and Verma, 1978). Soybean (Glycine max) nodules contain four major species of hemoglobin, called lba, lbcl, lbc2, Ibc3 and several minor species (Fuchsman and Appleby, 1979). The soybean leghemoglobins are encoded by a small family of genes (Jensen et al., 1981; Sullivan et al., 1981; Marcker et al., 1981; Hyldig-Nielsen et al., 1982; Brisson and Verma, 1982; Wiborg et al., 1982, 1983). Six leghemoglobin genes are arranged in two independent clusters in the soybean genome: one cluster contains two genes in the order 5'-~b2Lb-Lbc2-3' and the other four genes in the order 5"-Lba-Lbc l-dp~LbLbc3. The genes in both clusters have the same transcriptional polarity (Lee et al., 1983; Bojsen et al., 1983). The following observations can be made from a comparison of the intragenic organizations of the mammalian globin genes (Efstratiadis et al., 1980; Maniatis et al., 1980; Margot et al., 1989) and the leghemoglobin genes. (1) The principal difference is the presence of an extra intron within the exon 2 region. Otherwise, the splicing points of the first and third intron in leghemoglobins coincide with the positions in mammalian globin genes; however, no significant stretches of homology occur between the plant and mammalian introns. In addition, the plant introns exhibit greater variation in size. (2) Apart from the polydenylation site (Brisson and Verma, 1982; Hyldig-

Organization of non-vertebrate globin genes Nielsen et al., 1982), the Y flanking regions are not homologous. A recent survey of these regions in 56 nuclear plant genes (Joshi, 1987) suggested several differences in detail between plant and animal genes. (3) Two stretches of significant homology occur in the 5' coding region (Christensen et al., 1989) in addition to the general eukaryotic TATA and CAT boxes: a ~ 30 bp sequence in the region surrounding the cap site and another region near position - 100 (H'). The imperfect tandemly repeated sequence H' is known to be essential for optimum promoter functions in rabbit beta-globin genes (Dierks et al., 1983). (4) Comparison of the coding regions of the leghemoglobin and mammalian giobin genes suggests that the plant and animal globins diverged something between 900 and 1400 Myers ago, based on the number of base substitutions per nucleotide site and the assumption of a constant evolutionary rate and of 500 Myrs as the date for the divergence of alpha- and beta-globins (Brown et al., 1984). (ii) Sesbania leghemoglobin genes. Sesbania rostrata is a tropical legume distantly related to soybean, which upon infection with Azorhizobium caulinodans forms stem, as well as root, nodules. The latter contain hemoglobin consisting of at least seven components, of which one, LbVI appears to be five times more abundant in stem than in root nodules (Bogusz et aL, 1987). Strittmatter et al. (1989) have isolated two different clones, LbSrl and LbSr6, from a stem nodule-specific eDNA library, which show ORF's coding for globins of 148 and 146 amino acids, respectively: the two sequences exhibit a homology of 73*/0. The extent of this homology is less than among the soybean leghemoglobin eDNA sequences which reaches ~90%. Furthermore, LbSrl and LbSr6 are 70% homologous to soybean hemoglobins lbc2 and lba, respectively. The Sesbania globin LbSrl appears to correspond to component LblI (Kortt et aL, 1987) observed by Bogusz et al. (1987) and globin LbSr6 to component LblV. De Bruijn and his collaborators have isolated two genomic loci from Sesbania rostrata (Metz et aL, 1988). The amino acid sequences deduced from the nucleotide sequences of Srglb2 and Srglb3 show that they encode components LbVI and LbVII, respectively (Metz et al., 1988; Welters et al., 1989). The structures of these loci are similar to each other and to that of the Glycine max leghemoglobin genes. The sequences at the intron-exon boundaries are homologous to those found for 18 Glycine gene introns (Slightom et aL, 1983) and for other eucaryores (Breathnach and Chambon, 1981). All the expected regulatory sequences, such as the start/stop codons, intron/exon boundaries, CAAT and TATA boxes and transcription start and stop sites are located at the proper positions in the Srglb2. (iii) Alfalfa leghemoglobin genes. Several hemoglobins are present in Medicago sativa nodules (Jing et al., 1982). The eDNA sequences of two components corresponding to LbI and LbIII have been determined (Davidowitz et al., 1989; Kiss et al., 1987). Barker and his collaborators have investigated the leghemoglobin genes from Medicago truncatula and their expression. This family appears to comprise three to four genes which can be divided into two groups (Barker et al., 1988, 1990). Two representative genes, Mtibl and Mtlb2, have been isolated; they

765

code for globins of 146 and 145 amino acid residues, respectively (Gallusci et al., 1991). The homology between the coding regions of the two genes if 86% at the nucleotide level and 82% at the amino acid level. RNase protection experiments showed that both genes were transcriptionally active. The leghemoglobin transcripts can be first detected at Day 5 after inoculation, with Mtlbl mRNA being present at 3-fold higher steady-state levels than Mtlb2 mRNA up to day 28 (Gallusci et aL, 1991). (d) Non-legume globin genes: actinorhyzal symbiosis Although actinorhizal plants rival legumes in the amount of nitrogen that they fix on a global basis, they have not been studied as much. In contrast to freeriving rhizobia which can fix nitrogen only at very low partial pressures of oxygen, free-living Frankia can fix nitrogen at atmospheric oxygen pressures (Tjepkema et al., 1986). Although hemoglobin occurs in all effective legume nodules it is not ubiquitous in actinorhyzal symbiosis (Tjepkema, 1983). The highest concentrations were observed in the nodules of Casuarina and Myrica (Fleming et al., 1987; Tjepkema and Asa; Baker and Mullin, 1992). Decreasing concentrations of CO-reactive heine were found in nodules of Alnus, Comptonia, Veanothus, Hippophae, Elaeagnus, Coriaria and Purshia, but not in Datisca (Tjepkema and Asa, 1987). Although DNA sequences hybridizing with soybean leghemoglobin eDNA probes have been reported in Alnus, Betula, Casuarina, Ceanothus, Elaeagnus and Myrica (Roberts et al., 1985; Hattori and Johnson, 1985), no detailed confirmation of these results has appeared. Appleby and his collaborators have isolated a hemoglobin (HbI) from Casuarina glauca/Frankia nodules and determined its amino acid sequence (Fleming et al., 1987; Kortt et al., 1988). Recently, Marcker and his collaborators have isolated a Casuarina glauca hemoglobin gene by screening a genomic library with a genomic Parasponia hemogiobin probe (Christensen et al., 1991). This HbII gene codes for a polypeptide of 159 amino acid residues which, surprisingly, shows only a 53% homology to the Casuatina HbI, but a much higher homology, ~80%, to Trema and Parasponia hemoglobins. Furthermore, the intron positions in the HbII gene are the same as in other plant hemoglobins (Landsmann et al., 1986). Comparison of the promoter regions of Casuarina, Parasponia and Trema globin genes with those of soybean Lba and Lbe3 gene promoters, shows that the putative nodulin consensus sequences 5' AAAGAT Y and 5' CTCTT Y, described as promoters of leghemoglobin genes (Stougaard et al., 1987a, 1990), and other soybean nodulin genes (Sandal et al., 1987) are partially conserved in the Casuarina HblI promoter region, also in inverted form, similarly to the Trema and Parasponia promoters (Bogusz et al., 1990). (e) Symbiosis-dependent non-legume globin genes The genus Parasponia belongs to the family Ulmacea (order Urticales), consisting of about 15 genera and 200 species of worldwide distribution (Becking, 1992). It appears to be the only non-legume that has root nodules and symbiosis with Bradyrhizobium and Rhizobium. Hemoglobin is found in the

766

SERGEN. VINOGRADOVet al.

nodules of Parasponia/Rhizobium symbiosis (Appleby et al., 1983) and its sequence has been determined (Kortt et al., 1985). In contrast to other plant globin gene families only one globin gene locus appears to exist in Parasponia andersonii. The structure of a complete gene coding for its 161-amino-acids-long hemoglobin has been determined (Landsmann et al., 1986). Although the sequences of two different clones isolated from genomic DNA differ at four nucleotides, the authors consider them to represent allelic heterogeneity at a single locus rather than two different genes. (f) Symbiosis-independent, non-legume plant globin genes Bogusz et al. (1988) used a Parasponia cDNA probe to detect globin-like sequences in the genomes of Trema tomentosa and Celtis australis (Ulmacea) and related plants, which are incapable of forming symbiotic associations with any of the 98 representative strains of Rhizobium and Bradyrhizobium. The Trema globin gene sequence had four exons and three introns at positions identical to those in the Parasponia gene. Although the exon sequences of the two genes have 93% nucleotide similarity, the introns, untranslated leader and Y-untranslated sequences exhibited ~ 80% similarity. Northern blot hybridization of Parasponia hemoglobin cDNA to the total mRNAs isolated from Parasponia nodules, roots and leaves, and to Trema roots and leaves, did not detect any globin mRNA in the leaves of either plant. However, globin mRNA was present in the roots of both plants, albeit at ~ 1000-fold lower levels than in the Parasponia nodules (Bogusz et al., 1988). Furthermore, the globin mRNA is translated in Trema roots into a monomeric globin of 161 amino acid residues. Although the Parasponia globin is also of the same size and highly homologous with the Trema globin, it is homodimeric (Wittenberg et al., 1986). These results imply that globins are found in plant roots and may have a physiological function. In soybean, repeated efforts to detect expression of a hemoglobin gene in roots have been unsuccessful (Stougaard et al., 1987b; Appleby et al., 1988a). The concentrations of hemoglobin in the roots of non-nodulated plants (< 1 x 10 -7 M), are probably too low to permit them to function in the facilitated diffusion of oxygen to root mitochondria. Appleby et al. (1988b) suggest that, instead, they may function as signal molecules to indicate oxygen deficiency and the consequent need to shift root metabolism from an oxidative to a fermentative pathway of energy generation. Wright and his collaborators have suggested that leghemoglobin may also function as an electron donor (Bakan et al., 1991). (g) Regulation of plant globin gene expression (i) Sequential expression of lb genes. Verma et al. (1979) showed that only Lbc mRNAs accumulate in very young soybean (Glycine max) nodules, in agreement with the appearance of lbc3 and lbc 1 globins before the lba globin (Fuchsman and Appleby, 1979). Alterations in the expression of leghemoglobins from the pea (Pisum sativum) with nodule development have also been observed (Uheda and Syono, 1982).

Marcker et al. (1984) demonstrated that the activation of the soybean globin occurs exponentially within 7-8 days of infection. The genes are activated sequentially in the opposite order to which they are arranged in the soybean genome, lbc3 before lbc 1. At ~ 12 days post-infection there is a large increase in the transcription of all genes except lbc2, which persists for several weeks. The low level of expression oflbc2 may be explained on the basis that it is a fused gene consisting of a 5' lbcl end and a 3" lbc3 end (Jensen et al., 1983) which has an impaired promoter region. (ii) Symbiont regulation of lb gene expression. Although it is known that the host plant genome encodes apoleghemoglobin (Brisson and Verma, 1982), the biosynthetic origin of the prosthetic heme group remains unclear. Early measurements of the activities of berne biosynthetic enzymes indicated that heme is synthesized by the symbiont (Nadler and Avissar, 1977). Bradyrhizobium japonicum hemA mutants deficient in delta-aminolevulinic acid synthase, the first committed step in heme biosynthesis, elicited Fix + nodules in soybean which produced leghemoglobin (Guerinot and Chelm, 1986). On the other hand, a Bradyrhizobium japonicum, mutant deficient in protoporphyrinogen oxidase, a later step in heme biosynthesis, elicited Fix- soybean nodules that produced leghemoglobin apoprotein but not holoprotein (O'Brian et al., 1987). Conflicting results have also been obtained with different Rhizobium melilota hemA mutants and alfalfa, Medicago sativa: nearly wild-type levels of leghemoglobin (Mohapatra and Puhler, 1986) and a complete absence of leghemoglobin (de Bruijn et al., 1989). Furthermore, it has been suggested that different steps of heme biosynthesis may be partitioned between the bacteroid and the plant-derived peribacteroid membrane (Jacobs et al., 1989). Dickstein et al. (1991) re-examined the nodules elicited by the Rhizobium meliloti hemA mutants used earlier by Mohapatra and Puhler (1986) and de Bruijn et al. (1989): they found the nodules to have atypical morphology and infection thread development and to be unable to express any of six late nodulin genes tested by hybridization, including leghemaglobin. They suggest that it is important to establish the stage of the development at which a mutant nodule is arrested. (iii) DNA elements and protein factors in lb gene expression. Marcker and his collaborators have carried out a deletion analysis of the interactions between DNA sequence motifs involved in the root nodule-specific expression of the soybean leghemoglobin lbc3 gene in transgenic Lotus corniculatus plants, which defined several elements (Stougaard et al., 1986, 1987a,b): (1) a strong positive element required for maximum expression at positions nt - 1 0 9 0 and nt -947; (2) a weaker positive element sufficient for residual promoter activity at nt - 2 3 0 to nt -170; (3) an organ-specific element located at nt - 1 3 9 to - 1 0 2 ; and (4) a negative element at nt - 1 0 2 to - 4 9 (Stougaard et al., 1987a). Efficient enhancement by the strong positive element requires the presence of an organ-specific element. The latter contains the putative nodulin consensus sequences 5'AAAGAT and 5'CTCTT and does not enhance expression by itself (Stougaard et al., 1990). A study

Organization of non-vertebrateglobin genes of protein-DNA interactions, using nuclear extracts of soybean nodules, leaves and roots, has identified two distinct elements in the 5' upstream region of the lbc3 gone which map at nt -223 to - 2 4 6 and -161 to - 1 7 6 (Jensen et al., 1988); the trans-acting nuclear protein factor is nodule-specific and binds to both elements, albeit with different affinities. Identical patterns of kghemoglobin mRNA accumulation during the first 10 days of nodule formation on stems and roots of Sesbania, were observed by Strittmatter et al. (1989) in Northern hydridization experiments using the insert of eDNA clone LbSR 1 as a leghemoglobin-specific probe. Binding of the NAT2 factor at the weak positive elements located in the nt - 9 4 5 and - 1 3 9 regions, one of which is conserved in Sesbania legHb genes (Metz et al., 1988; Welters et al., 1989) is not required for high expression levels or specific control. De Bruijn et al. (1989) examined the promoter region of the glb3 leghemoglobin gone in the Sesbania rostrata/Azorhizobium caulinodans symbiotic systems for the presence of conserved motifs homologous to binding sites 1 and 2 of the soybean lbc3 promoter region found by Jensen et al. (1988) to interact with a trans-acting factor present in soybean nodule nuclear extracts. In addition to the binding sites identified earlier by Metz et al. (1988), other sites were observed to interact with trans-acting factors. One of the latter appears to be developmentally regulated and tissue-specific. Furthermore, the Srglb3 CAAT and TATA box region was found to be required for nodule-specific expression and at least two distinct regions with enhancer activity were located upstream at positions nt - 4 5 0 to - 6 5 0 and nt - 6 5 0 to - 1500 (de Bruijn et al., 1989). The nodule-regulated Parasponia andersonii hemoglobin gone is expressed at low levels in roots (Bogusz et al., 1988; Landsmann et al., 1988), suggesting that this gone lacks a negative element in the promoter region and is only positively regulated in nodules (Stougaard et al., 1990). Since deletion of the lbc3 negative element (nt - 1 0 2 to -49), does not lead to unregulated expression, there may be subtle differences between soybean lbc3 and Parasponia globin gone regulation (Stougaard et al., 1990). (h) The evolution o f plant globin genes The observation of hemoglobins only in legume nodules led early on to proposals that leghemoglobins represented a case of convergent evolution with animal hemoglobins (Appleby, 1974) or of horizontal gene transmission from a bacterialor animal origin (Appleby, 1974; Jeffreys,1981; Hyldig-Nielsen et aI., 1982). The determination of soybean leghemoglobin gene organization (Jensen et al., 198 I; HyldigNielsen et al., 1982; Brown et al., 1984) suggested that leghemoglobins and animal hemoglobins had a comm o n origin.Inparticular,the findingof a third intron in the leghemoglobin geno was in agreement with the resultsof a domain analysisof mouse hemoglobins by G o 0981) and the conclusion that ancestral hemoglobin genes contained a third intron. Furthermore, gene sequence analysispointed to 900-1400 Myrs ago as the time of divergence of plant and animal hemoglobins (Brown et al., 1984). It has been suggested

767

that leghemoglobins could be the result of an early horizontal transmission of an animal or insect globin gone to plants (Jeffreys, 1981). Transfer of an insect gone appears unlikely in view of the introniess structure of the Chironomus hemoglobin genes sequenced so far (Antoine and Niessing, 1984; Trewitt et al., 1988; Hankeln et ai., 1991; Rozynek et al., 1991). Nematodes are possible animals since they parasitize plants (Zuckerman et al., 1971). The Pseudoterranova globin gone structure, discussed earlier, neither supports nor rules out the possibility of horizontal transmission from a nematode. It is evident that the determination of the structure of a nematode singledomain globin gone is desirable to help resolve this issue. Furthermore, in order to explain the now-known presence of hemoglobin in many non-leguminous plants, it becomes necessary to invoke repeated acts of horizontal transmission: this appears to be implausible. The most likely explanation is the one proposed by Appleby et al. (1988a; 1988b), namely, vertical descent from a protoorganism ancestral to plants and animals. The survival of hemoglobin genes in legumes and non-leguminous plants over a period of 1400 Myrs, since the divergence of plant and animal lines, implies that they must have had until recently, or still have, critical functions other than oxygen transport in plant nodules, otherwise they would have disappeared as a consequence of random mutation (Appleby et al., 1990).

CONCLUSION: ARE

GLOBINGENES UBIQUrrOUS?

The presence of a variety of globin-like genes in many different prokaryotes, in simple eukaryotes such as yeasts and protozoa, in a large number of metazoan phyla and in leguminous and non-leguminons plants in the presence and absence of symbiosis, demonstrates that globin-like molecules may be much more widespread than previously thought and raises the possibility that globin-liko genes may be ubiquitous (Riggs, 1991). In this connection, the occurrence of globin genes, which arc either not expressed or are expressed at very low levels in organisms as diverse as Chironomus, Anadara, E. coli and Caenorrhabditis mentioned earlier, suggests that a search for globinlike genes in phyla in which hemoglobin is not known to be present, or is present only in a few groups, may prove to be fruitful. It is appropriate to note here the recent finding of an ORF in the nucleotid¢ sequence of chromosome 3 of Saccharomyces cerevisiae which is homologous to the n/f gone of the nitrogen-fixing bacterium Anabaena (Oliver et al., 1992)! If globin or globin-liko genes are indeed ubiquitous, then the elucidation of the mechanisms of the suppression of their expression and of whether their primary structures persisted for long evolutionary times, albeit with alterations due to random mutation or selection of new functions, would provide much scope for future investigation. Acknowledgements--We would like to acknowledgehelpful suggestions, criticismsand corrections provided to us during the preparation of this reviewby Drs C. A. Appleby, D. (3. Barker, (3. Bergtrom, M. Goodman, J. Niessing, (3. Strittmatter and D. P. S. Verma. This work was supported

768

SERGEN. VINOGRADOVet al.

in part by National Institutes of Health grants DK 38674 (SNV) and DK 30382 (DAW) and by grants from the Research Council of Canada and the Department of Fisheries and Oceans Subvention Program (BP). REFERENCES

Antoine M. and Niessing J. (1984) Intron-less globin genes in the insect Chironomus thummi thummi. Nature 310, 795-798. Antoine M., Erbil C., Munch E., Schnell S. and Niessing J. (1987) Genomic organization and primary structure of five homologous pairs of intronless genes encoding secretory globins from the insect Chironomus thummi thummi. Gene 56, 41-51. Appleby C. A. (1974) Leghemoglobin. In The Biology of Nitrogen Fixation (Edited by Quispel A.), pp. 521-554. North-Holland, Amsterdam. Appleby C. A. (1984) Leghemoglobin and Rhizobium respiration. A. Rev. Plant Physiol. 35, 443-478. Appleby C. A., Bogusz D., Dennis E. S., Fleming A. I., Landsmann J. and Peacock W. J. (1988a) The vertical evolution of plant haemoglobin genes. In Nitrogen Fixation: Hundred Years After (Edited by Bothe H., de Bruijn F. J. and Newton W. E.), pp. 623-628. Gustav Fischer, Stuttgart. Appleby C. A., Bogusz D., Dennis E. S. and Peacock W. J. (1988b) A role for haemoglobin in all plant roots? Plant, Cell Environ. 11, 359-367. Appleby C. A., Dennis E. S. and Peacock W. J. (1990) A primeval origin for plant and animal haemoglobins? Austr. System. Bot. 3, 81-89. Appleby C. A., Tjepkema J. D. and Triniek M. J. (1983) Hemoglobin in a non-leguminous plant, Parasponia: possible genetic origin and function in nitrogen fixation. Science 220, 951-953. Bakan D. A., Saltman P., Theriault Y. and Wright P. E. (1991) Kinetics and mechanisms of reduction of Cu(II) and Fe(II) complexes by soybean leghemoglobin a. Biochim. biophys. Acta 1079, 182-196. Baker D. D. and Mullin B. C. (1992) Actinorhizal symbioses. In Biological Nitrogen Fixation (Edited by Staeey G., Bun'is R. H. and Evans H. J.), pp. 259-292. Routledge, Chapman & Hall, London. Barbour W. M., Wang S. Q. and Stacey G. (1992) Molecular genetics of Bradyrhizobium symbioses. In Biological Nitrogen Fixation (Edited by Stacey G., Burris R. H. and Evans H. J.), pp. 648-684. Routledge, Chapman & Hall, London. Barker D. G., Gallusci P., Lullien V., Khan H., Gherardi M. and Huguet T. (1988) Identification of two groups of leghemoglobin genes in alfalfa (Medicago sativa) and a study of their expression during root nodule development. Plant molec. Biol. 11, 761-772. Barker D. G., Bianchi S., Blondon F., Dattee Y., Duc G., Dlament P., Gallusci P., Genier G., Guy P., Muel X., Tourneur J., Denarie J. and Huguet T. (1990) Medicago truncatula, a model plant for studying the molecular genetics of the Rhizobium-legume symbiosis. Plant molec. Biol. Rep. 8, 40-49. Barnes R. S. K., Calow P. and Olive P. J. W. (1988) The Invertebrates: a New Synthesis. Blackwell Scientific, Oxford. Bashford D., Chothia C. and Lesk A. M. (1987) Determinants of a protein fold. Unique features of the globin amino acid sequences. J. molec. Biol. 196, 199-216. Baulcombe D. and Verma D. P. S. (1978) Preparation of a eDNA for leghaemoglobin and direct demonstration that leghaemoglobin is encoded by the soybean genome. Nucl. Acids Res. 5, 4141-4153. Baur X., Dewair M., Haegele K., Prelicz H., Scholl A. and Tichy H. (1983) Common antigenic determinant of haemoglobin as basis of immunological cross-reactivity between chironomid species (Diptera, Chironomidae):

studies with human and animal sera. Clin. exp. lmmunoL 54, 599-607. Baur X., Aschauer H., Mazur G., Dewair M., Preliez H. and Steigmann W. (1986) Structure, antigenic determinants of some clinically important insect allergens: chironomid hemoglobins. Science 233, 351-354. Becking J. H. (1992) The Rhizobium symbiosis of the nonlegume Parasponia. In Biological Nitrogen Fixation (Edited by Stacey G., Burris R. H. and Evans H. J.), pp. 497-559. Routledge, Chapman & Hall, London. Bergtrom G., Laufer H. and Rogers R. (1976) Fat body: a site of hemoglobin synthesis in Chironomus thummi (Diptera). J. Cell Biol. 69, 264-274. Bergtrom G. and Robinson J. (1977) Ultrastructural localization of the site of hemoglobin synthesis in Chironomus thummi (Diptera). J. Ultrastruct. Res. 60, 395-405. Bernstiel M. L., Busslinger M. and Strub K. (1985) Transcription termination and 3' processing: the end is in site Cell 41, 349-359. Blanehetot A., Wilson V., Wood D. and Jeffreys A. J. (1983) The seal myoglobin gene: an unusually long globin gene. Nature 301, 732-734. Blanchetot A., Wilson V., Wood D. and Jeffreys A. J. (1986) The mouse myoglobin gene. Characterisation and sequence comparison with other mammalian myoglobin genes. Eur. J. Biochem. 159, 469-474. Bogusz D., Appleby C. A., Landsmann J., Dennis E. S., Trinick M. J. and Peacock W. J. (1988) Functioning hemoglobin genes in non-nodulatingplants. Nature 331, 178-180. Bogusz D., Kortt A. A. and Appleby C. A. (1987) Sesbania rostrata root and stem nodule leghemaglobins: purification and relationship among the seven major components. Archs Biochem. Biophys. 254, 263-271. Bogusz D., Llewellyn D., Craig S., Dennis E. S., Appleby C. A. and Peacock W. J. (1990) Nonlegume hemoglobin gene and organ-specific expression in heterologous plants. Plant Cell 2, 633-641. Bojsen K., Abildsten D., Jensen E. O., Paludan K. and Marejer K. A. (1983) The chromosomal arrangement of six soybean leghemoglobin genes. EMBO J. 2, 1165-1168, Boyd D. and Beckwith J. (1990) The role of charged amino acids in the localization of secreted and membrane proteins. Cell 62, 1031-1033. Breathnaeh R. and Chambon P. A. (1981) Organization and expression of eucaryotic split genes coding for proteins. A. Rev. Biochem. 50, 349-383. Brisson N. and Verma D. P. S. (1982) Soybean leghemoglobin gene family: normal, pseudo and truncated genes. Proc. hath. Acad. Sci. U.S.A. 79, 4055-4059. Brown G. G., Lee J. S., Brisson N. and Verma D. P. S. (1984) The evolution of a plant globin gene family. J. molec. Evol. 21, 19-32. Buse G., Steffens G. J., Braunitzer (3. and Steer W. (1979) Haemoglobin (Erythrocruorin) CTT III aus Chironomus thummi thummi (Diptera). Primarstruktur und Neziehung zu anderen Hamproteinen. Z. physiol. Chem. 360, 89-97. Chapman R. F. (1982) The Insects: Structure and Function, 3rd edn, pp. 378-500. Harvard University Press, Cambridge, MA. Cheng J. F., Raid L. and Hardison R. C. (1986) Isolation and nucleotide sequence of the rabbit globin gene cluster ~ - - a l-¢/a. J. biol. Chem. 261, 839-848. Christensen T., Dennis E. S., Peacock J. W., Landsmann J. and Mareker K. A. (1991) Hemoglobin genes in nonlegumes: cloning and characterization of a Casuarina glauca hemoglobin gene. Plant molec. Biol. 16, 339-344. Christensen T., Sandal N. N., Stougaard J. and Marcker K. A. (1989) 5' Flanking sequence of the soybean leghemoglobin lbc3 gene. Nucl. Acids Res. 17, 4383. Como P. F. and Thompson E. O. P. (1980) Amino acid sequence of the alpha chain of the tetramerie haemoglobin of the bivalve mollusc Anadara trapezia. Austr. J. biol. Sci. 33, 653-664.

Organization of non-vertebrate globin genes Cunningham T. S. and Cooper T. G. (1991) Expression of the DalS0 gene, whose product is homologous to the GATA factors and is a negative regulator of multiple nitrogen catabolic genes in Saccharomyces cerevisiae, insensitive to nitrogen catabolite repression. Molec. Cell Biol. 11, 6205-6215. Darawshe S. and Daniel E. (1991) Structure of the polypeptide chain of extraceUular hemoglobin from the nematode Ascaris suum. Comp. Biochem. Physiol. 99B, 425-429. Darawshe S., Tsafadyah Y. and Daniel E. (1987) Quaternary structure of erythrocruorin from the nematode Ascaria suum. Biochem. J. 242, 689-694. David M., Daveran M. L., Batut J., Dedieu A., Domerque O., Chai J., Hertig C., Boistard P. and Kahn D. (1988) Cascade regulation of n/f gene expression in Rhizobium meliloti. Cell 54, 671-683. Davidowitz E. J., Dow A. and Lang-Unnasch N. (1989) Nucleotide sequence of a eDNA clone encoding a leghemoglobin from Medicago sativa. Nucl. Acids Res. 17, 3307. de Bruijn F. J., Felix G., Grunenberg B., Hoffman H. J., Metz B., Rotet P., Simons-Schrirer A., Szabados L., Welters P. and Schell J. 0989) Regulation of plant genes specifically induced in nitrogen fixing nodules: role of c/sacting elements and trana-acting factors in leghemoglobin gene expression. Plant molec. Biol. 13, 319-325. Dickstein R., Scheirer D. C., Fowle W. H. and Ausubel F. M. (1991) Nodules elicited by Rhizobium meliloti mutants are arrested at an early stage of development. Molec. gen. Genet. 230, 423-432. Dierks P., van Doyen A., Cochran M. D., Dobkin D., Reiser J. and Weissman C. 0983) Three regions upstream from the cap site are required for efficient and accurate transcription of the rabbit beta-globin genes in mouse 3T6 cells. Cell 32, 695-706. Dikshit R. P., Dikshit K. L., Liu Y. and Webster D. A. (1991) The bacterial hemoglobin from Vitreoscilla can support aerobic growth of Escherischia coil lacking terminal oxidases. Archs Biochem. Biophys. 293, 241-245. Dikshit K. L., Dikshit R. P. and Webster D. A. (1990) Study of Vitreoscilla globin (vgb) gene expression and promoter activity in E. coli through transcriptional fusion. Nucl. Acids Res. 18, 4149-4155. Dikshit K. L. and Webster D. A. (1988) Cloning, characterization and expression of the bacterial globin gene from Vitreoscilla in E. coli. Gene 70, 377-386. Dixon B., Walker B., Kimmius W. and Pohajdak B. (1991) Isolation and sequencing of an unusual hemoglobin from the parasitic nematode Pseudoterranova decipiens. Proc. natn. Acad. Sci. U.S.A. 88, 5655-5659. Dixon B., Walker B., Kimmins W. and Pohajdak B. (1992) A nematode hemoglobin gene contains an intron previously thought to be unique to plants. J. molec. Evol. 35, 131-136. Doolittle R. F., Anderson K. L. and Feng D. F. (1989) In The Hierarchy of Life (Edited by Fernholm B., Bermer K. and Jornvall H.), pp. 73--85. Elsevier, Amsterdam. Eardly B. D., Materon L. A., Smith N. H., Johnson D. A., Rumbaugh M. D. and Selander R. K. (1990) Genetic structure of natural populations of the nitrogen-fixing bacterium Rhizobium meliloti. Appl. Envir. Microbiol. 56, 187-194. Efstratiadis A., Posakony .I.W., Maniatis T., Lawn R. M., O'ConneU C., Spritz R. A., DeRiel J. D., Forget B. G., Weissman S. M., Slightom J. L., Blechl A. E., Smithies O., BaraUe F. E., Shoulders C. C. and Proudfoot N. J. (1980) The structure and evolution of the human beta-globin gem) family. Cell 21, 653-668. English D. S. (1969) Ontogenetic changes in hemoglobin synthesis in Chiranomus thummi (Diptera). J. Embryol. exp. Morph. 72, 465-476. Erbil C. and Niessing J. (1984) Chromosomal arrangement of duck alpha-globin genes and primary structure of

769

the embryonic alpha-globin gene ~'. Gene 32, 161170. Fabb S. A., Cheong M. F., Lee B. T. O. and Martin J. (1990) A sex-inl]uencedprotein in Chironomus (Diptera). Insect Biochem. 20) 135-140. Fisher W. K., Gilbert A. T. and Thompson E. O. P. (1984) Amino acid sequence of the globin IIB chain of the dimcric haemoglobin of the bivalve mollusc Anadara trapezia. Auatr. J. biol. $ci. 37, 191-203. Fleming A. I., Wittenberg J. B., Wittenberg B. A., Dudman W. F. and Appleby C. A. (1987) The purification characterization and ligand-binding kinetics of hemoglobins from root nodules of the non-leguminous Casuarina glauca-Frankia symbiosis. Biochim. biophys. Acta 911, 209-220. Frenkel M. J., Dopheide T. A. A., Wagland B. M. and Ward C. W. (1992) The isolation, characterization and cloning of a globin-like, host-protective antigen from the excretory-secretory products of Trichostrongylua colubriformia. Molec. Biochem. Parasit. 50, 27-36. Fuchsmann W. H. and Appleby C. A. (1979) Separation and determination of the relative concentrations of the homogeneous components of soybean leghemoglobin by isuelectric focusing. Biochim. biophys. Acta 579, 314-324. Fuller F. and Verma D. P. S. (1984) Appearance and accumulation of nodulin mRNAs and their relationship to the effectiveness of root nodules. Plant molec. Biol. 3, 21-28. Fushitani K., Matsunra M. S. A. and Riggs A. F. (1988) The amino acid sequences of chains a, b and c that form the trimer subunit of the extracellular hemoglobin of Lumbricua terrestris. J. biol. Chem. 263, 6502-6517. Gallnsci P., Dedieu A., Journet E. P., Huguet T. and Barker D. (3. (1991) Synchronous expression of leghemoglobin genes in Medicago truncatula during nitrogen-fixing root nodule development and response to exogenously supplied nitrate. Plant molec. Biol. 17, 335-349. Gilbert A. T. and Thompson E. O. P. (1985) Amino acid sequence of the ~-chain of the tetrameric haemoglobin of the bivalve mollusc Anadara trapezia. Austr. J. biol. Sci. 38, 221-236. Gilles-Gonzalez M. A., Ditta G. S. and Helinski D. R. (1991) A haemoprotein with kinase activity encoded by the oxygen sensor of Rhizobium meliloti. Nature 350, 170-172. Go M. (1981) Correlation of DNA exonic regions with protein units in hemoglobin. Nature 291, 90-92. Goodman M., Brannitzer G., Kleinschmidt T. and Aschauer H. (1983) The analysis of a protein polymorphism. Evolution of monomeric and homodimeric haemoglobins (erythrocruorins) of Chironomus thummi thummi (Insects, Diptera). Z. Physiol. Chem. 354, 205-217. Goodman M., Pedwayson J., Czelusniak J., Suzuki T., Gotoh T., Moens L., Shishikura F., Walz D. A. and Vinogradov S. N. (1988) An evolutionary tree for invertebrate globin sequences. J. molec. Evol. 27, 236-249. Oovers F., Nap J. P., van Kammen A. and Bisseling T. (1987) Nodulins in the developing root nodule: an overview. Plant Physiol. Biochem. 25, 309-322. Guerinot M. L. and Chehn B. K. (1986) Bacterial deltaaminolevulinic acid synthase activity is not essential for leghemoglobin frmation in the soybean/Bradyrhieobium japonicum symbiosis. Proc. natn. Acad. Sci. U.S.A. 83, 4403-4407. Hankeln T., Luther C., Rozynek P. and Sehmidt E. R. (1991) Complete nucleotide sequence of a hemoglobin gene cluster from the midge Chironomus thummi piger. In Structure and Function of Invertebrate Oxygen Carriers (Edited by Vinogradov S. N. and Kapp O. H.), pp. 287-296. Springer, New York. Hankeln T., Rozynek P. and Schmidt E. R. (1988) The nucleotide sequence and in situ localization of a gene for

770

SERGEN. VINOGRADOVet al.

a dimeric hemoglobin from the midge Chironomus thummi piger. Gene 64, 297-304. Hardison R. C. (1991) Evolution of globin gene families. In Evolution at the Molecular Level (Edited by Selander, R. K., Clark A. G. and Whittam T. S.), pp. 272-290. Sinauer, Sunderland, MA. Hattori J. and Johnson D. A. (1985) The detection of leghernoglobin-fike sequences in legumes and non-legumes. Plant molec. Biol. 4, 285-292. Hyldig-Nielsen J., Jenssen E. O., Paludan K., Wiborg O., Garrt R., Jorgensen P. and Marker K. A. (1982) The primary structure of two leghemoglobin genes from soybean. Nucl, Acids Res. 10, 689-701. Iwaasa H., Takagi T. and Shikama K,. (1989) Protozoan myoglobin from Paramecium caudatum. J. molec, biol. Chem. 208, 355-358. Iwaasa H., Takagi T. and Shikama K. (1991) Amino acid sequence of hemoglobin from yeast Candida norvegensis. Zool. Sci. 8, 1134. Jacobs N. J., Borotz S. E. and Guerinot M. L. (1989) Protoporphyrinogen oxidation, a step in heine synthesis in soybean root nodules and free-living rhizobia. J. Bact. 171, 573-576. Jeffreys A. J. (1981) In Genetic Engineering (Edited by Williamson R., Vol. 2, pp. 1-48. Academic Press, London. Jensen E, O., Hein J., Paludan K. and Marcker K. A. (1983) The soybean leghemoglobin gene family. In Plant Molecular Biology (Edited by Goldberg R.), pp. 367-379. Alan R. Liss, New York. Jensen E. O., Marcker K. A., Schell J. and de Bruijn F. J. (1988) Interaction of a nodule-specific trans-acting factor with distinct DNA elements in the soybean leghemoglobin lbc3 Y-upstream region. EMBO J. 7, 1265-1271. Jensen E. O., Paludan K., Hyldig-Nielsen J. J., Jorgensen P. and Marcker K. A. (1981) The structure of a chromosomal leghemoglobin gene from soybean. Nature 291, 677-679. Jhiang S. M., Garey J. R. and Riggs A. F. (1988) Exon-intron organization in genes of earthworm and vertebrate globins. Science 240, 334-336. Jhiang S. M. and Riggs A. F. (1989) The structure of the gene encoding chain c of the hemoglobin of the earthworm Lumbricus terrestris. J. biol. Chem. 264, 19003-19008. Jing Y., Paau A. S. and Brill W. J. (1982) Leghemoglobins from alfalfa (Medicago satire L. vernal) root nodules I. Purification and in vitro synthesis of five leghemoglobin components. Plant Sci. Lett. 25, 119-132. Joshi C. P. (1987) Putative polyadenylation signals in nuclear genes of higher plants: a compilation and analysis. Nucl. Acids Res. 15, 9627-9640. Kao W. Y. and Bergtrom G. (1992) The evolution of the globin genes of the insect Chironornus thummi. International Congress on Invertebrate Dioxygen Carriers. Lunteren, The Netherlands, April 12-17. Keilin D. (1956) Distribution and diversity of haemoglobin, Acta biochim, pol. 3, 439-457. Keller E. B. and Noon W. A. (1984) Intron splicing: a conserved internal signal in introns of animal pre-mNRAs. Proc. natn. Acad. Sci. U.S.A. 81, 7417-7420. Khosla C. and Bailey J. E. (1988a) The Vitreoscilla hemoglobin gene: molecular cloning, nucleotide sequence and genetic expression in Escherischia coli. Molec. gen. Genet. 214, 158-161. Khosla C. and Bailey J. E. (1988b) Heterologous expression of a bacterial haemoglobin improves the growth properties of recombinant Escherischia coli. Nature 331, 633-636. Khosla C. and Bailey J. E. (1989) Evidence for partial export of Vitreoscilla hemoglobin into the periplasmic space in Escherischia coli. J. molec. Biol. 210, 79--89. Kiss G. B., Vegh Z. and Vincze E. (1987) Nucleotide sequence of a eDNA clone encoding leghemoglobin III (LbllI) from Medicago sativa. Nucl. Acids Res. 15, 3620.

Kortt A. A,, Burns J. E., Trinick M. J. and Appleby C. A. (1985) The amino acid sequence of hemoglobin I from Parasponia andersonii, a nonleguminous plant. FEBS Lett. 180, 55-60. Kortt A. A., Strike P. M., Bogusz D. and Appleby C. A. (1987) The amino acid sequence of hemoglobin II from Sesbania rostrata stem nodules. Biochem. Int. 15, 509-516. Kreil G. (1981) Transfer of proteins across membranes. A. Rev. Biochem. 50, 317-348. Kubo H. (1939) Uber Hamaprotein as den Wurzelknollchen von Leguminosen. Acta Phytochim. 11, 195-200. Kuhse J. and Puehler A. (1987) Conserved sequence motifs in the untranslated 3' end of leghemoglobin transcripts isolated from broadbean nodules. Plant Sci. 49, 137-143. Landsmann J , Dennis E. S., Higglns T. J. V., Appleby C. A., Kortt A. A. and Peacock W. J. (1986) Common evolutionary origin of legume and non-legume plants haemoglobins. Nature 32,4, 166-168. Landsmann J., Dennis E. S., Higglns T. J. V., Appleby C. A., Kortt A. A. and Peacock W. J. (1988) Tissue regulated expression of the Parasponia haemoglobin genes in transgenic tobacco plants. Molec. gen. Genet. 214, 68-73. Laufer H., Bergtrom G. and Rogers R. (1976) Stage and tissue specific hemoglobin synthesis in an invertebrate. In Invertebrate Tissue Culture (Edited by Kurstak E. and Maramorasch K.), pp. 227-240. Academic Press, New York. Laufer H. and Poluhowich J. (1971) A factor controlling the concentration of hemoglobins in Chironomus during metamorphosis. Limnologica (Berlin) 8, 125-126. Laufer H., Vafopoulou-Mandalos X., Kuliawat R. and Gundling G. (1982) Tissue-specific and gene-speciflc sites of hemoglobin synthesis in Chironomus. In Embryonic Development. Part A: Genetic Aspects (Edited by Burger M. M. and Weber R.), Vol. 85, pp. 327-335. Alan R. Liss, New York. Lee J. S., Brown G. G. and Verma D. P. S. (1983) Chromosomal arrangement of leghemoglobin genes in soybean. Nucl. Acids Res. 11, 5541-5553. Legocki R, P. and Verma D. P. S. (1980) Identification of nodule-specific host proteins involved in the development of Rhizobium-legume symbiosis. Cell 20, 153-163. Lesk A. M. and Chothia C. (1980) How different amino acids sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J. molec. Biol. 136, 225-270. Leung S., Proudfoot N. J. and Whitelaw E, (1987) The gene for 0-globin is transcribed in human fetal erythroid tissues. Nature 329, 551-554. Long S. R. (1989) Rhizobium-legume nodulation: life together in the underground. Cell 56, 203-214. Maniatis T., Fritsch E. F., Aner J. and Lawn R. M. (1980) The molecular genetics of human hemoglobins. A. Rev. Genet. 14, 145-178. Mann R. G., Fisher W. K., Gilbert A. T. and Thompson E. O. P. (1986) Genetic variation of the dimeric haemoglobin of the bivalve mollusc Anadara trapezia. Austr. J. biol. Sci. 39, 109-115. Marcker K. A., Gausing K., Jochimsen B., Jorgensen P., Paludan K. and Truelscn E. (1981) In Genetic Engineering in the Plant Sciences (Edited by Panopoulos N. J.), pp. 63-71. Praeger, New York. Marcker A., Lund M., Jensen E. O. and Marcker K. A. (1984) Transcription of the soybean leghemoglobin genes during nodule development. EMBO J. 3, 1691-1695. Marcker K. A. and Sandal N. N. (1987) Evolution of the leghemoglobins. In Plant Molecular Biology (Edited by yon Wettstein D. and Chua N. H.), pp. 505-508. Plenum Press, New York. Margot J. B., Demers G. W. and Hardison R. C. (1989) Complete nucleotide sequence of the rabbit fl-globin

Organization of non-vertebrate globin genes cluster. Analysis of intergenic sequences and comparisons with the human t-like globin gene cluster. J. molec. Biol. 205, 15-40. Metz B. A., Welters P., Hoffman H. J., Jensen E. O., Schell J. and de Bruijn F. J. (1988) Primary structure and promoter analysis of leghemoglobin genes of the stemnodulated tropical legume Sesbania rostrata: conserved coding sequences, c/s-elements and trans-acting factors. Molec. gen. Genet. 214, 181-191. Mohapatra S. S. and Puhler A. (1986) Detection of nodulespecific polypeptides from effective and ineffective root nodules of Medicago sativa L. J. Plant Physiol. 126, 269-281. Montell C., Fisher E. F., Caruthers M. H. and Berk A. J. (1983) Inhibition of RNA cleavage but not polyadenylation by a point mutation in mRNA 3' consensus sequence AAUAAA. Nature 305, 600-605. Myers C. R., Collins M. L. P., Agresti M. and Bergtrom G. (1986) Haemoglobin-producing tissues of larvae and pupae of Chironomus thummi (Diptera). J. Insect Physiol. 32, 845-851. Nadler K. D. and Avissar Y. J. (1977) Heme synthesis in soybean root nodules: on the role of bacteroid deltaaminolevulinicacid synthase and delta-aminolevulinic acid dehydrase in the synthesis of the heine of leghemoglobin. Plant Physiol. 60, 433-436. Naito Y., Riggs C. K., Vanderoon T. L. and Riggs A. F. (1991) Origin of a "bridge" intron in the gene for a two-domain globin. Proc. natn. Acad. Sci. U.S.A. 88, 6672-6676. Naito Y., Riggs C. K., Naritomi Y., Vandergon T. and Riggs A. F. (1992) Structure of the gene for the two-domain globin of Barbatia reeveana. International Congress on Invertebrate Dioxygen Carriers. Lunteren, The Netherlands, April 12-17. Nishioka Y., Leder A. and Leder P. (1980) Unusual alpha-globin-like gene that has clearly lost both globin intervening sequences. Proc. ham. Acad. Sci. U.S.A. 77, 2806-2809. O'Brian M. R., Kirschbom P. M. and Maier R. J. (1987) Bacterial heine synthesis is required for expression of the leghemoglobin holoprotein but not the apoprotein in soybean root nodules. Proc. natn. Acad. Sci. U.S.A. 84, 8390-8393. Oliver S. G. et al. (1992) The complete DNA sequence of yeast chromosome III. Nature 357, 38-46. Oshino R., Asakura T., Takio K., Ishino N. and Chance B. (1973) Purification and molecular properties of yeast hemoglobin. Fur. J. Biochem. 39, 581-590. Osmulski P. and Leyko W. (1986) Structure, function and physiological role of Chironomus haemoglobin. Comp. Biochem. Physiol. 85B, 701-722. Pfletschinger J., Plagens H. and Braunitzer G. (1980) The primary structure of the monomeric haemoglobin (erythrocruorin) component CTT IV of Chironomus thummi thummi (Insecta, Diptera). Z. Naturforsch. 35c, 840-843. Pinero D., Martinez E. and Selander R. K. (1988) Genetic diversity and relationships among isolates of Rhizobium leguminosarum biovar phaseoli. .4ppl. environ. Microbiol. 54, 2825-2832. Probst L., Wolf (3. and Schlegel H. (3. (1979) An oxygenbinding flavohemoprotein from Alcaligenes eutrophus. Biochim. biophys. ,4cta 576, 471-478. Reznikoff W. and McClure W. R. (1986) E. coil promoters. In Maximizing Gene Expression (Edited by Reznikoff W. and Gold L.), pp. 1-33. Butterworths, Boston, MA. Riggs A. F., Riggs C. K., Lin R. J. and Domdey H. (1986) Cloning of the eDNA for the globin from the clam Barbatia reeveana. In Invertebrate Oxygen Carriers (Linzen B.), pp. 473-476. Springer, Berlin. Riggs C. K. and Riggs A. F. (1990) eDNA-derived amino acid sequences of single and two-domain globins from the

771

clam Barbatia reeveana. In Invertebrate Dioxygen Carriers (Edited by Preaux (3. and Lontie R.), pp. 57-60. Leuven University Press, Leuven. Riggs A. F. (1991) Aspects of the origin and evolution of nonvertebrate hemoglobins. Am. Zool. 31, 535-545. Riggs A. F. (1992) Evolutionary pathways of hemoglobin. International Congress on Invertebrate Dioxygen Carriers. Lunteren, The Netherlands, April 12-17. Roberts M. P., Jafar S. and Mullin B. C. (1985) Leghemoglobin-like sequences in the DNA of four actinorhizal plants. Plant. molec. Biol. 5, 333-337. Robertson J. G., Wells B., Bisseling T., Farnden K. J. F. and Johnson A. W. B. (1984) Immunoguld localization of leghaemoglobin in cytoplasm of nitrogen-fixing root nodules of pea. Nature 311, 254-256. Rozynek P., Broecker M., Hankeln T. and Schmidt E. R. (1991) The primary structure of several hemoglobin genes from the genome of Chironomus tentans. In Structure and Function of Invertebrate Oxygen Carriers (E~ted by Vinogradov S. N. and Kapp O. H.), pp. 297-303. Springer, New York. Rozynek P., Hankeln T. and Schmidt E. R. (1989) Structure of a hemoglobin gene cluster and nucleotide sequence of three hemoglobin genes from the midge Chironomus thummi piger (Diptera, Insecta). Biol. Chem. Hoppe-Seyler 370, 533-542. Saffarini D. A., Trewitt P. M., Castro M., Wejksnora P. J. and Bergtrom (3. (1985) Deoxynucleotide sequence of an insect eDNA codes for an unreported member of the Chironomus thummi globin family. Biochem. biophys. Res. Commun. 133, 641-647. Saffarini D. A., Trewitt P. M., Luhm R. A. and Bergtrom G. (1991) Differential regulation of insect globin and actin mRNAs during larval development in Chironomus thummi. Gene 101, 215-222. Sandal N. N., Bojsen K. and Marcker K. A. (1987) A small family of nodule-specific genes from soybean. Nucl. Acids Res. 15, 1507-1519. Schmidt E. R., Keyl H. G. and Hankeln T. (1988) In situ localization of two haemoglobin gene clusters in the chromosomes of 113 species of Chironomus. Chromosoma 96, 353-359. Sladi6-Simi~ D., Kleinschmidt T. and Braunitzer G. (1979) Hamoglobin XXVI. Analyse der Primarstruktur des dimeren Insektenhamoglobins CTT VIIB (Erythrocruorin) aus Chironomus thummi thummi (Diptera). Hoppe-Seyler's Z. Physiol. Chem. 360, 115-124. Slightom J. L., Sun S. M. and Hall T. C. (1983) Complete nucleotide sequence of a French bean storage protein: phaseolin. Proc. hath. Aead. Sci. U.S.A. 80, 1897-1901. Stormo G. D. (1986) Translation initiation. In Maximizing Gene Expression (Edited by Reznikoff W. and Gold L.), pp. 195-224. Butterworths, Boston, MA. Stougaard J., Marcker K. A., Otten L. and Schell J. (1986) Nodule specific expression of a chimaeric soybean leghaemoglobin gene in transgenic Lotus corniculatus. Nature 321, 669-674. Stougaard J., Petersen T. E. and Marcker K. A. (1987b) Expression of a complete soybean leghaemoglobin gene in root nodules of transgenic Lotus corniculatus. Proc. natn. Acad. Sci. U.S.A. 84, 5754-5757. Stougaard J., Jorgensen J. E., Christensen T., Kuhle A. and Mareker K. A. (1990) Interdependence and nodule specificity of cis-acting elements in the soybean leghemoglobin lbc3 and N23 gene promoters. Molec. gen. Genet. 220, 353-360. Stougaard J., Sandal N. N., Gron A., Kuhle A. and Marcker K. A. (1987a) 5' Analysis of the soybean lbc3 gene: regulatory elements required for promoter activity and organ specificity. EMBO J. 6, 3365-3369. Strittmatter G., Chia T. F., Trinh T. H., Katajiri F., Ku_hlemeier C. and Chua N. H. (1989) Characterization of nodule-specific cDNA clones from Sesbania rostrata

772

SEgGE N. VINOt3RADOVet al.

and expression of the corresponding genes during the initial stages of stem nodules and root nodules formation. Molec. Plant Microbe Int. 2, 122-127. Sullivan D., Brisson N., Goodchild B., Verma D. P. S. and Thomas D. Y. (1981) Molecular cloning and organization of two leghemoglobin genomic sequences of soybean. Nature 289, 516-518. Sulston J., Du Z., Thomas K., Wilson R., Hillier L., Staden R., Halloran N., Green P., Thierry-Mieg J., Qiu L., Dear S., Coulson A., Craxton M., Durbin R., Berks M., Metzstein M., Hawkins T., Ainscough R. and Waterston R. (1992) The C. eleguns genome sequencing project: a beginning. Nature 356, 37-41. Suzuki T., Jhiang S. M., Donahue R. A. and Riggs A. F. (1992) The structure of the gene encoding a non-heine chain of the hemoglobin of the earthworm Lumbricus terrestris. International Congress on Invertebrate Dioxygen Carriers. Lunteren, The Netherlands, April 12-17.

Suzuki T., Takagi T. and Gotoh T. (1990a) Primary structure of two linker chains of the extracellular hemoglobin from the polychaete Tylorrhynchus heterochaetus. J. biol. Chem. 265, 12168-12177. Titchen D. A., Glenn W. K., Nassif N., Thompson A. R. and Thompson E. O. P. (1991) A minor globin gene of the bivalve mollusc Anadara trapezia. Biochim. biophys. Acta 1089, 61-67. Tjepkema J. D. (1983) Hemoglobins in the nitrogen-fixing root nodules of actinorhizal plants. Can. J. Bot. 61, 2924-2929. Tjepkema J. D. and Asa D. J. (1987) Total and CO-reactive heme content of some non-nodulating plants. Plant and Soil 100, 225-236. Tjepkema J. D., Schwintzer C. R. and Benson D. R. (1986) Physiology of actinorhizal nodules. Rev. Plant Physiol. 37, 209-232. Trewitt P. M. and Bergtrom G. (1982) Immunological characterization of the haemoglobins of Chironomus thummi (Diptera). Insect Biochem. 11, 635-644. Trewitt P. M., Boyer D. R. and Bergtrom G. (1986) Characterization of maternal haemoglobins in the eggs and embryos of Chironomus thummi. J. Insect Physiol. 32, 963-969. Trewitt P. M., Saffarini D. A. and Bergtrom G. (1988) Multiple clustered genes of the haemoglobin VIIB subfamily of Chironomus thummi thummi (Diptera). Gene 69, 91-100. Uheda E. and Syono K. (1982) Physiological role of leghaemoglobin heterogeneity in pea root nodule development. Plant Cell Physiol. 23, 75-84. Vafopoulou-Madalos X. and Laufer H. (1982) The ontogeny of multiple hemoglobins in Chironomus thummi (Diptera): the effect of a compound with juvenile hormone activity. Devl. Biol. 92, 135-143. Vafopoulou-Mandalos X. and Laufer H. (1984a) Changes in template activity of protein and globin mRNA during Chironomus development. Differentiation 25, 209-215. Vafopoulou-Mandalos X. and Laufer H. (1984b) Tissuespecificity of hemoglobin synthesis. Localization of heme synthesis in the subepidermal fat body of Chironomus thummi (Diptera). Arch. Insect Biochem. Physiol. 1, 191-197. Vanin E. F. (1984) Processed pseudogenes. Biochim. biophys. Acta 782, 231-241. Vanin E. F., Goldberg G. I., Tucker P. W. and Smithies O. (1980) A mouse alpha-globin-related pseudogene lacking intervening sequences. Nature 286, 222-226. Van Kammen A. (1984) Suggested nomenclature for plant genes involved in nodulation and symbiosis. Plant Molec. Biol. Rep. 2, 43-45. Vasudevan S. G., Armarego W. L. F., Shaw D. C., Lilley P. E., Dixon N. E. and Poole R. K. (1991) Isolation and nucleotide sequence of the hmp gene that encodes

a haemoglobin-like protein from Escherischia coli K-12. Molec. gen. Genet. 226, 49-58. Verma D. P. S., Ball S., Guerin C. and Wanamaker L. (1979) Leghemoglobin biosynthesis in soybean root nodules. Characterization of the nascent and released peptides and the relative rate of synthesis of the major leghemoglobins. Biochemistry 18, 476-483. Verma D. P. S. and Delauney A. J. (1988) Root nodule symbiosis: nodulins and nodulin genes. In Temporal and Spatial Regulation o f Plant Genes (Edited by Verma D. P. S. and Goldberg R. B.), pp. 169-199. Springer, New York. Verma D. P. S., Fortin M. G., Stanley J., Mauro V. P., Purolit S. and Morrison N. (1986) Nodulins and nodulin genes of Glycine max. Plant molec. Biol. 7, 51-61. Vinogradov S. N. (1985) Quaternary structure of invertebrate extracellular hemoglobins and chlorocruorins. Comp. Biochem. Physiol. 8211, 1-16. Vinogradov S. N., Sharma R. K., Qabar A. N., Wall J. S., Westrick J. A., Simmons J. H. and Gill S. J. (1991) A dodecamer of globin chains is the principal functional subunit of the extracellular hemoglobin of Lumbricus terrestris. J. biol. Chem. 266, 13091-13096. Vinogradov S. N., Walz D. A., Moens L, Pohajdak B. and Kapp O. H. (1992) Adventitious variability: the amino acid sequences of non-vertebrate hemoglobins. (in preparation). Wakabayashi S., Matsubara H. and Webster D. A. (1986) Primary sequence of a dimeric bacterial haemoglobin from Vitreoscilla. Nature 322, 481-483. Weber R. E., Braunitzer G. and Kleinschmidt T. (1985) Functional multiplicity and structural correlations in the hemoglobin system of larvae of Chironomus thummi thummi (Insecta, Diptera): Hb components CTTI, CTI'Ilfl, CTTIII, CTTIV, CTTVI, CTTVIIB, CTTIX and CTTX. Comp. Biochem. Physiol. 8013, 747-753. Weihs V., Schmidt K., Schneider B. and Friedrich B. (1989) The formation of an oxygen-binding ttavohemoprotein in Alcaligenes eutrophus. Arch. Microbiol. 151, 546-550. Weiner A. M., Deininger P. L. and Efstratiadis A. (1986) Nonviral retroposons: genes, pseudogenes and transposable elements generated by the reverse flow of genetic information. A. Rev. Biochem. 55, 631-661. Weller P., Jeffreys A. J., Wilson V. and Blanchetot A. (1984) Organization of the human myoglobin gene. EMBO J. 3, 439-446. Welters P., Metz B. A., Schell J. and de Bruijn, F. J. (1989) Nucleotide sequence of the Sesbania rostrata leghemoglobin (Srglb3) gene. Nucl. Acids Res. 17, 1253. Wiborg O., Hyldig-Nielsen J. J., Jensen E. O., Paludan K. and Marcker K. A. (1982) The nucleotide sequence of two leghemoglobin genes from soybean.. Nucl. Acids Res. 10, 3487-3494. Wiborg O., Hildig-Nielsen J. J., Jensen E. O., Paludan K. and Marcker K. A. (1983) The structure of an unusual leghemoglobin gene from soybean. EMBO J. 2, 449452. Wittenberg J. B., Wittenberg B. A., Gibson Q. H., Trinick M. J. and Appleby C. A. (1986) The kinetics of the reactions of Parasponia andersonii hemoglobin with oxygen, carbon monoxide and nitric oxide. J. bioL Chem. 261, 13624-13531. Yamauchi K. (1992) Private communication. Yamauchi K., Mukai M., Oehiai T. and Usuki, I. (1992) Molecular cloning of the cDNA for the major hemoglobin component from Paramecium caudatum. Biochem. biophys. Res. Commun. 182, 195-200. Young J. P. W. (1985) Rhizobium population genetics: enzyme polymorphism in isolates from peas, clover, beans and lucerne grown at the same site. J. gen. Microbiol. 131, 2399-2408.

Organization of non-vertebrate globin genes Young J. P. W. (1992) Phylogenetic classification of nitrogen-fixing organisms. In Biological Nitrogen Fixation (Edited by Stacey G., Burris R. H. and Evans H. J.), pp. 43-86. Routledge, Chapman & Hall, London. Zharkikh A. A., Solovyov V. V. and Kolchanov N. A. (1984) Conformational changes in the giobin family during evolution. I. Analysis of the evolutionary role of insertions and deletions. J. molec. Evol. 21, 42-53.

~ P B 103/4--B

773

Zhu H. and Riggs A. F. (1992) Yeast flavohemoglobin is an ancient protein related to globins and a reductase family. Proc. hath. Acad. Sci. U.S.A. 89, 5015-5019. Zuckerman B. M., Mai W. F. and Rohde R. A. (Editors) (1971) Plant Parasitic Nematodes. Academic Press, New York.

Organization of non-vertebrate globin genes.

The organization of non-vertebrate globin genes exhibits substantially more variability than the three-exon, two-intron structure of the vertebrate gl...
2MB Sizes 0 Downloads 0 Views