Photosynthesis Research 16:7-22 (1988) © Kluwer Academic Publishers, Dordrecht - Printed in the Netherlands

Minireview

Gene organization and newly identified groups of genes of the chloroplast genome from a liverwort, Marchantia

polymorpha K A N J I O H Y A M A , l* T A K A Y U K I K O H C H I , I H I D E Y A F U K U Z A W A , I** T O H R U S A N O , 1 K A Z U H I K O U M E S O N O 2 & HARUO OZEKI 2

tResearch Centerfor Cell and Tissue Culture. Faculty of Agriculture, Kyoto University, Kyoto 606, Japan; 2Department of Biophysics, Faculty of Science, Kyoto University, Kyoto 606, Japan;* To whom all correspondence should be sent; **Present address." Institute of Applied Microbiology, University of Tokyo, Tokyo 113, Japan Received 28 September 1987; accepted 1 December 1987

Key words: chloroplasts, iron-sulfur protein, membrane transport protein, Marchantia poly-

morpha, NADH(PQ) oxidoreductase Abstract. The complete nucleotide sequence of chloroplast DNA from a liverwort, Marchantia polymorpha has made clear the entire gene organization of the chloroplast genome. Quite a

few genes encoding components of photosynthesis and protein synthesis machinery have been identified by comparative computer analysis. Other genes involved in photosynthesis, respiratory electron transport, and membrane-associated transport in chloroplasts were predicted by the amino acid sequence homology and secondary structure of gene products. Thirty-three open reading frames in the liverwort chloroplast genome remain unidentified. However, most of these open reading frames are also conserved in the chloroplast genomes of two species, a liverwort, Marchantia polymorpha, and tobacco, Nicotiana tabacum, indicating their active functions in chloroplasts. Abbreviations: bp - base pair, kDa - kilodalton, IR - inverted repeat, ORF - open reading frame, DALA - 6-aminolevulinate Introduction

T h e c o m p l e t e nucleotide sequences o f c h l o r o p l a s t D N A f r o m two green plants, a liverwort, Marchantia polymorpha ( O h y a m a et al. 1986), a n d t o b a c c o , Nicotiana tabacum ( S h i n o z a k i et al. 1986), are k n o w n . T h e overall gene c o m p o s i t i o n o f b o t h o r g a n i s m s is similar despite the large difference in g e n o m e size. N o w we are able to integrate i n f o r m a t i o n o n c h l o r o p l a s t p h o t o s y n t h e t i c m e c h a n i s m s with the structure a n d f u n c t i o n o f c h l o r o p l a s t genes (Ellis 1981, S t e i n b a c k et al. 1985). I n this article, we will describe the overall gene o r g a n i z a t i o n a n d newly identified g r o u p s o f genes, o f w h i c h

8

some are unique to liverwort, for proteins in photosynthesis, electron transport, and membrane-associated transport systems in chloroplasts.

Overall gene organization

The complete nucleotide sequence of the liverwort chloroplast DNA consists of 121,024 base pairs (bp) in a double-stranded circular form with a set of large inverted repeats (IRA and IRB, each of 10,058 bp), a large single copy region (LSC, 81,095bp), and a small single copy region (SSC, 19,813 bp). The total G + C content of the liverwort chloroplast genome is 28.8% (G + C, 69,726; A + T, 172,322). The coding sequences for stable RNA genes have a higher G + C content, 52.6% for tRNAs and 52.1% for rRNAs. On the other hand, the coding sequences for proteins have 28.5% G + C, and the spacer regions between the coding sequences have much less G + C (19.5%). We have detected 136 possible genes of which 103 gene products are related to known stable RNAs or proteins. Stable RNA genes for four species of ribosomal RNAs (rRNAs) and 32 species of transfer RN As (tRNAs) (one of them, proline-like tRNA, is a pseudogene) have been located on the liverwort chloroplast genome. Twenty-five genes had structural similarity to three subunits (~t, 13, and 13' subunits) of Escherichia coli RNA polymerase, of which 13' subunit gene in E. coli may have been split into two liverwort chloroplast genes (rpoC1 and rpoC2), 19 ribosomal proteins (rpl and rps genes), and two related proteins (infA and secX genes). Twenty genes encoding polypeptides involved in photosynthesis, electron transport, and membrane-associated transport systems were deduced by comparison with known chloroplast or E. coli genes. Interestingly, seven open reading frames (ORFs) had a high degree of homology to human mitochondrial NADH dehydrogenase genes. We could further deduce two ORFs for bacterial 4Fe-4S type ferredoxin, one with homology to a component of nitrogenase, two for the components of inner membrane permease in E. coli, and one for an antenna protein of a light harvesting complex in cyanobacteria. The other 33 ORFs (29 to 2136 codons) remain unidentified despite an extensive computer-aided search for homology.

Genes for tRNAs and rRNAs in liverwort chloroplast genome

The coding sequences for four kinds o f r R N A genes (16S, 23S, 4.5S, and 5S) and 32 species of tRNA genes (proline tRNAoc~ is a pseudogene with incomplete amino-acyl stem structure) have been deduced from the entire

liverwort chloroplast nucleotide sequence. Duplicated rRNA genes are located in the IR regions (IRA and IRa). Five duplicate tRNA genes, trnV(GAC), trnI(GAU), trnA(UGC), trnR(ACG), and trnN(GUU), are also present in the inverted repeat regions (Fig. 1). None of the tRNA genes has a CCA sequence at the 3' end of their coding sequences. The tRNA genes were scattered throughout the liverwort genome. Six of the tRNA genes, trnA(UGC), trnI(GAU), trnG(UCC), trnK(UUU), trnL(UAA), andtrnV(UAC), have introns in their coding sequences (Fig. 1). All rRNA and tRNA molecules seem to be generated within the chloroplasts because it is generally believed that no RNA molecule can enter the

A

r~' ,14

[ /

.rpoCl

LIVERWORT CHLOROPLAST GENETIC MAP

,poC2

$sl rpOl •

~'~%3

J

~ L S C

o

"'~

81,095

g

Fig. 1. Genetic map of genes for protein synthesis and related genes in the liverwort chloroplast genome. Nomenclature of chloroplast genes follows that of Crouse et al. (1985). Asterisks indicate genes with introns in their coding sequences. Genes rpo, infA, and secX are for subunits of R N A polymerase, initiation factor, and ribosomal protein of large subunits (L36; Wada and Sako 1987), respectively.

10 chloroplasts from the cytoplasm. Therefore, the 31 species of tRNAs deduced from D N A sequence analysis must satisfy codon-anticodon recognition in the chloroplasts (Table 1). The number of tRNA species in liverwort chloroplasts is smaller than the estimated number (about 50) of E. coli or Bacillus subtilis (Fournier and Ozeki 1985), but higher than the 24 species in yeast mitochondria (Bonitz et al. 1980) and the 22 species in human mitochondria (Anderson et al. 1981). In mitochondria, however, the codon table is not universal. Therefore, this is the first time that a complete set of tRNAs in a genetic system has been elucidated to clarify universal codon table (Crick 1966). In tobacco chloroplasts, 30 species of tRNA genes have been described with the difference that arginine tRNAcco is present in liverwort but not in tobacco (Shinozaki et al. 1986). The liverwort proline tRNAoGo pseudogene was not detected in tobacco either. This implies that the number of tRNA genes in chloroplasts is probably being reduced in the course of evolution as seen with mitochondria. Table I Codon

Codon table and tRNAs encoded by liverwort chloroplast genome tRNA

Codon

tRNA

Codon

UCU

UUU

tRNA

Phe GAA

UCC

GGA UGA

tRNA

UGU Cys

Tyr UUC

Codon

UAU UAC

GUA

O~

GCA

Ser

UUA

UAA*

UCA

bUG

CAA

UCG

UAA

Ter UGA

UAG

Trp UC~

CCA

CGU

ACG

Ter

Leu

CUU

CCU

CAU His

CUC Leu

ccc

(GGG)

CAC

GUG

CCA

UGG

CAA

UUG

Pro CUA

UAG

CGC Arg CGA

Gin CUG

CCG

AUU

ACU

CGG

CAG

AAU

AGU

Asn lle AUC

GAU*

ACC

GGU UGU

CCG

Ser AAC

GUU

AGC

GCU

AGA

UCU

Thr AUA

CAU

ACA

Met AUG fMet

CAU CAU

ACG

UUU* Lys

GCU

GUU

Arg AAG

AGG

GAU

GGU

Asp GUC

GAC

GUA

UAC*

Val

GCC

GAC

GUC

GAA

UUC

GGC

GCC

GGA

UCC*

Gly

Ala GCA

UGC~ Glu

GUG /mticodons are Termination codons genes with introns symbols. Proline

C~

GAG

C~G

expressed with unmodified bases. The AUG codon i s an i n i t i a t i o n codon. (UAA. UAG, and UGA) are indicated by Ter. Asterisks indicate tRNA in their coding sequences. Amino acids are shown by three-letter tRNAGGG is a pseudogene.

11 A unique glutamate tRNA with an unusual modified anticodon is involved in chlorophyll synthesis in chloroplasts (Schon et al. 1986). A molecule of chlorophyll is synthesized from eight molecules of 6-aminolevulinate (DALA), the universal precursor of porphyrins. The components involved in chlorophyll synthesis contain an RNA as the essential component, identified as a chloroplast glutamate tRNA. The nucleotide sequence of the trnQ(UUG) gene in the liverwort chloroplast genome had a high degree of homology with that reported for RNA DALA in higher plants. This trnQ(UUG) must have dual functions for both protein and chlorophyll synthesis in the chloroplasts, because it is the only gene for glutamate tRNA in the liverwort chloroplast genome. We have analyzed codon usages of all the genes including the unidentified ORFs. A distinctive feature of the codon usage is that 88.1% of the third letters of codons are either A or U. This A or U preference in the codon usage coincides with the overall high A + T content (72.2%) in the liverwort chloroplast genome, and it facilitated the precise identification of ORFs throughout the liverwort chloroplast genome. We observed a peculiar codon usage in the psbA gene (Table 2). As described above, either A or U is preferred at the third letter of codons in liverwort protein genes, but the codon usage of psbA is different in the choice of the third letter, especially for pyrimidine two-codon boxes such as asparagine, aspartic acid, cysteine, histidine, phenylalanine, serine, and tyrosine, where the psbA gene used cytosine more often than uracil in the third letter of the codons. This peculiar codon usage in the psbA gene may be correlated with the translational efficiency (stronger codon-anticodon association by G-C pairs than A-U pairs) of the mRNA molecules.

Genes for protein synthesis and related genes We have identified coding sequences for 19 ribosomal proteins (large subunit proteins L33, L20, L14, L16, L22, L2, L23, and L21; small subunit proteins $7, $2, S14, $4, S18, S12, Sll, $8, $3, S19, and S15). In addition, the genes infA (E. coli initiation factor 1), secX (recently identified as the ribosomal large subunit protein L36; Wada and Sako 1987) and rpo (rpoA, rpoB, and rpoCl-rpoC2 for ~, fl, and fl' subunits of RNA polymerase, respectively) were on the liverwort chloroplast genome (Fig. 1). It is of interest that ribosomal protein and related genes from a large cluster in the chloroplast genome (rp123-rp12-rps19-rp122-rps3-rp116-rp114-rps8-infAsecX-rps11-rpoA genes), although others are scattered throughout the genome. The genes in the cluster have a similar order to the clusters reported

12

.~

c~

°

zJ~

13 for the E. coli ribosomal protein operons such as S10 (SlO-L3-L4-rp123rp12-rps19-rp122-rps3-rp116-L29-S17; Zurawski and Zurawski 1985), spc (rp114-L24-L5-S14-rps8-L6-L18-S5-L30-secY-secX; Cerretti et al. 1983) and ct (S13-rpsll-S4-rpoA-L17; Bedwell et al. 1985). Three additional clusters of ribosomal protein genes are seen with the rps12'(exon 1)-rp120, rp133-rps18, and rps'12(exons 2 and 3)-rps7 genes. The ribosomal protein S12 gene (rps12) has been reported to be split between different DNA strands indicating in vivo trans-splicing (Fukuzawa et al. 1986). It is well known that the nuclear genome encodes genes for chloroplast proteins, and that proteins synthesized in the cytoplasm are then transported into chloroplasts and assembled with the organelle-synthesized proteins to form functional complexes such as ribosome particles. It is interesting that the ribosomal proteins (S 15, $4, $8, and $7) in the 30S subunits encoded on the liverwort chloroplast genome are essential for the initial assembly of ribosomal protein-rRNA molecules (Dorne et al. 1984). Large ribosomal subunit components (L2 and L20) also form the major center of the assembly (Wittmann 1983). The gene encoding ribosomal protein L21 (rp121) was not seen in tobacco, and the gene for ribosomal protein S 16 (rps16) was not deduced from the liverwort chloroplast genome. This indicates that the chloroplast genomes of these two species may have undergone different gene arrangements in the course of evolution.

Genes for photosynthetic and electron transport systems Genes for photosynthetic and electron transport polypeptides of the liverwort chloroplast genome were identified by comparison of the amino acid sequences of the ORFs with those of photosynthetic and electron transport proteins with structures deposited in the NBRF-PIR data base (Fig. 2). Genes (psaA and psaB) for the P700 chlorophyll a apoproteins of photosystem I (Fish et al. 1985) are closely linked with a very short spacer region (26 bp). The nucleotide sequence in this intergenic spacer is almost identical to the sequences of different species of plants, but the 5' and 3' flanking regions of the genes do not show any simlarities. Ths spacer region may have a control signal for transcription or translation of the psaB gene downstream. Eight genes (psbA, psbB, psbC, psbD, psbE, psbF, psbG, and psbH) for photosystem II polypeptides were in the large single-copy region. Two of them (psbD-psbC) (Alt et al. 1984, Holschuh et al. 1984) are in tandem with overlapping of the 3'-psbD and 5'-psbC ends. A 32-kDa protein gene, psbA (Zurawski et al. 1982) retains a highly conservative structure among

14

/

petD*

L,VERWORT CHLOROPLASt

GENETICMAP

petB* psbH

/

psbB

81095

t~

%

/ QO"

Fig.2. Genetic map of genes for photosynthesis in chloroplasts. Nomenclature of photosynthetic genes followsthat of Crouse et al. (1985). Asterisks indicate genes with introns in their coding sequences. higher plants; it is believed to be responsible for resistance of atrazine herbicide (Hirschberg and Mclntosh 1983). The liverwort psbG gene encoding 243 amino acids was identified by the amino acid sequence of its maize counterpart, which is located in the photosystem II particles (Steinmetz et al. 1986). However, our cluster analysis of genes by computer showed that the psbG gene behaved differently from other components of the photosystem II particles (Ohyama et al. unpublished results). The coding regions for thepsbEandpsbFgenes (Herrmann et al. 1984) are clustered on the opposite D N A strand next to the petA gene (Alt and Herrmann 1984). The psbB operon consisted of psbB, ORF35, psbH, petB, and petD genes, which are cotranscribed as a single precursor R N A (Kohchi et al. in preparation). The amino acid sequence of liverwort psbB protein shows a high degree of homology to that of the 51-kDa chlorophyll a apoprotein in spinach (Morris

15 and Herrmann 1984). The psbH gene product is a 10-kDa phosphoprotein associated with photosystem II (Westhoff et al. 1986). The ORF35 remains unidentified. We deduced the ORF43 gene divergently on the opposite DNA strand of the spacer region between ORF35 and the psbH genes. This ORF gene is actively transcribed during illumination (Kohchi et al. in preparation). Genes for cytochrome f protein (petA) (Alt and Herrmann 1984), cytochrome b6 (petB) (Heinemeyer et al. 1984), and cytochrome b6/f complex subunit IV (petD) (Phillips and Gray 1984) are found in the liverwort chloroplast genome. The petA protein deduced from the DNA sequence may have an N-terminal sequence of 35 amino acids as a signal peptide, because the N-terminal portion of the protein is species-specific, as in other species of plants (Alt and Herrmann 1984). DNA sequence analysis also showed that the petB and petD genes have group II introns in their coding sequences (Fukuzawa et al. 1987). The petB and petD genes are cotranscribed with genes in the psbB operon and the primary transcript is processed down to a bicistronic mRNA containing the petB-petD genes (Kohchi et al. in preparation). Six genes (atpI-atpH-atpF-atpA, and atpB-atpE) for the chloroplast H ÷ATPase (Henning and Herrmann 1986, Cozen et al. 1986), which consists of nine non-identical subunits, were identified by comparison of their amino acid sequences with those of genes in higher-plant chloroplast genomes. The atp gene organization in the liverwort chloroplast genome was the same as those in higher plants and similar to those of cyanobacteria and E. coli.

Newly identified groups of genes in liverwort chloroplast genome There are a few genes common to the liverwort and tobacco chloroplast genomes deduced from entire DNA sequences the gene products of which have not been described in chloroplasts. On the other hand, there are several genes that are detected in the liverwort chloroplast genome, but not in tobacco, and vice versa. Seven newly identified genes (ndhl, ndh2, ndh3, ndh4, ndh4L, ndh5, and ndh6) in the liverwort chloroplast genome correspond to components of human mitochondria NADH dehydrogenase (ND1, ND2, ND3, ND4, ND4L, ND5, and ND6, respectively) (Chomyn et al. 1985, Chomyn et al. 1986). In tobacco, the six genes ndhA, ndhB, ndhC, ndhD, ndhE, and ndhF correspond to ND1, ND2, ND3, ND4, ND4L, and ND5, respectively. The two continuous tobacco ORF 138 and ORF99B may be made to correspond

16

~/~..-SSC~

~9.813

00~/7r34 /~/" 135

,",o~

,

°~,

OeF29

ORF43

?'?~q~" ~il 128, ,024 jb~ .09sl/ ~

L

S

C

~-

.

%~_

"J("ORF203

o%~

°~,~

/

q

Fig. 3. Newly identified groups of genes and unidentified ORFs in the liverwort chloroplast genome. Genes ndh, frx, mbp, and lhcA encode subunits of NADH(PQ) oxidoreductase,

iron-sulfur binding proteins, subunits of membrane-associatedtransport system, and light harvestingprotein, respectively.ORFn indicatesan open readingframewith a certainnumber (n) of amino acids. to the human mitochondrial Nd6 by shifting the flame in the tobacco sequence to obtain maximum matching (Shinozaki et al. 1986). Northern hybridization analysis showed that there is active transcription of ndh genes in liverwort chloroplasts (Kohchi et al., unpublished results). DNA sequence analysis of watermelon mitochondria D N A fragments has found a portion of a watermelon mitochondrial gene corresponding to the ND1 gene in human mitochondria (Stern et al. 1986). Thus, the amino acid sequence of three species (liverwort chloroplast ndhl, human mitochondria ND1, and watermelon mitochondria URF1) had more than 35% homology to each other. This means that the liverwort chloroplast ndhl gene is

17 probably not a pseudogene of mitochondria and that it has diverged independently from a common origin. Reports dealing with the hydrogenasedependent processes, photoreduction and the dark oxyhydrogen reaction, in the chloroplasts of Chlamydomonas reinhardtii suggest that NADH(PQ) oxidoreductase is involved in the electron transport system for chloroplast respiration (Bennoun 1982, Maione and Gibbs 1986). This indicates that the products of chloroplast ndh genes may be components of the enzyme. Thefrx genes have been elucidated from the DNA sequence as iron-sulfur proteins based on the characteristic distribution of cysteine residues in the polypeptides deduced from the DNA sequence. In particular,frxA andfrxB gene products have repeated units of-C-X-X-C-X-X-C-X-X-X-C-P-, a typical amino acid sequence in bacterial ferredoxin (Howard et al. 1983, Minami et al. 1985). The gene product corresponding to frxA (psaC) has been identified in the Fe-S center of the photosystem I complex (Oh-oka et al. 1987, Hoj et al. 1987). ThefrxA gene originally was not described in the tobacco genome (Shinozaki et al. 1986). ThefrxA (psaC) gene in the tobacco genome has been deduced by the insertion of one nucleotide and shifting a frame in the tobacco sequence (Hayashida et al. 1987). Tobacco ORF167 in the single-copy region corresponded to the liverwort frxB gene. This observation indicates that thefrxA andfrxB genes are common to chloroplast genomes. The frxC gene product, which contains nine cysteine residues, has a high degree of homology to the Fe-binding protein in nitrogen-fixing bacteria, the nifH gene products in Azotobacter vinelandii (Hausinger and Howard 1982) and the F202 gene in Rhodopseudomonas capsulata (Youvan et al. 1984, Hearst et al. 1985). The amino acid sequence (-G-X-X-X-G-K-S-) in the N-terminal portion is a conserved nucleotide binding site seen in the ATP-binding proteins (Higgins et al. 1986). Curiously, this gene was not detected in the tobacco chloroplast genome, indicating the possibility of a pseudogene for the nitrogenase component. However, immunological analysis gave a positive reaction for the liverwort chloroplast extract with antibody againstfrxC gene product made in E. coli (Takahashi, personal communication). We have deduced mbpX and mbp Y genes from the liverwort chloroplast genome. The amino acid sequence of mbpX shows similarity to the sequences of inner membrane components of bacterial permeases such as the products of the genes hisP (Higgins et al. 1982), malK (Gilson et al. 1982), and oppD (Higgins et al. 1985) for the histodine, maltose, and oligopeptide transport systems, respectively. The mbpX gene product near the N-terminal portion has a typical amino acid sequence which is a consensus nucleotide binding site, indicating a membrane transport system by nucleotide-driven

18 energy in the liverwort chloroplasts. The hydrophobicity of the mbp Y gene product is also similar to that of another component of the inner membrane permease encoded by hisQ in Salmonella typhimurium (Higgins et al. 1982) and malF of E. coli (Froshauer and Beckwith 1984). These gene products may associate with components encoded on the nuclear genome and form a transport complex in chloroplast membranes. These genes could not be deduced from the tobacco chloroplast genome (Shinozaki et al. 1986).

Unidentified open reading frames Thirty-three open reading frames ranging from 29 (ORF29) to 2136 amino acid residues (ORF2136) remain unidentified in the liverwort chloroplast genome, although we attempted to identify their genes by extensive computer analysis. Most of the ORFs seem to be conserved in the liverwort and tobacco chloroplast genomes, indicating their active functions in chloroplasts (Table 3). Membrane spanning analysis with an algorithm of Klein et Table 3.

Predicted

location

o f ORF 8 e n e p r o d u c t s i n c h l o r o p l a s t s

and

l i v e r w o r t ORFs conserved i n the c h l o r o p l a s t genomes o f t o b a c c o a n d e l s e w h e r e . Liverwort ORF

Predicted location

0RF34 ORF135 ~ ORF29 ORF33 ORF30 ORF32 ORF36a 0RF513 ORF50 ORF370i 0RF2136 ORF62 ORF167* ORF169 0RF316 ORF36b ORFI84

I I I I I I I P I P I I P F P I I

0RF434 ORF40

I I

ORF38 ORF42a ORF31 0RF37 ORF42b ORF203 ~

I P I I I I

ORF35 ORF43

I I

0RF69

P

ORF320 0RF392 0RF464 ORFI068 0RF465

I P I I I

Remarks

Reference

+ + ORF2 Deno a n d S u g i u r a , 1983 ORF509 Sugita et al., 1985 ORFI708 Shinozakl e t a l . , 1986 URF62 in wheat Quigley and Well, 1985 ORF82 Shinozaki eta/., 1986 ORFI58 Shinozaki e t a l . , 1986 0RF512 Shinozaki et ~ . , 1986 + ORFI84 . Shinozaki et al., 1986 ORFI49 i n E. grecllls Montandon and Stutz, 1983 0RF229 Shinozaki e t a / . , 1986 ORF40 D.A. Bryant, personal in Cyanophora p s r a d o ~ communication + + + + X gene in spinach Westhoff, 1985 ORF73 + ORF74B Shinozaki eC al., 1986 +, i n psbB o p e r o n Kohchi e t a / . , unpublished +, divergent t o psbB operon, light-inducible Kohchi et a l . , u n p u b l i s h e d +, ribosomal protein Kohchi et al., unpublished ORF313 Shinozaki e t a / . , 1986 0RF393 Shinozaki et al., 1986 0RF228 Shinozaki et al., 1986 0RF1244 Shinozaki et al., 1986

Letters I and P show integral and peripheral locations of the ORFs in chloroplasts, respectively, predicted by computer-aided membrane spanning analysis (Klein e t a / . , 1985). Symbols + or - indicate the presence or absence of the corresponding ORFs in the tobacco DNA sequence ( ~ B L data library, release 12.0). Asterisks indicate ORF with introns.

19 al. (1985) showed the predicted location of ORF products in the chloroplasts (Table 3). Functionally related genes are likely to form a cluster on the chloroplast genome (see Fig. 2). These findings suggest that an ORF in a gene cluster is related functionally to nearby genes. Further computer-aided cluster anlaysis of the amino acid composition divided the liverwort chloroplast genes into four groups (two for photosynthetic and two for housekeeping genes) (Ohyama et al., unpublished results). The complete nucleotide sequences of chloroplast genomes from two species of plants gave us a tremendous amount of information on photosynthetic and chloroplast genetic systems. Identification of new genes (ORFs) and the elucidation of their function has been made possible by the information on DNA sequences. However, we were not able to gather any information on the chloroplast DNA replication system (DNA replication origin and DNA polymerase) from the complete DNA sequences, among other topics. For this investigation, we need a DNA transformation system in chloroplasts.

Acknowledgements This research was supported in part by a Grant-in-Aid for Special Research Projects from the Ministry of Education, Science, and Culture, Japan. We appreciate the continued encouragement of Prof. Dr Y. Yamada, Director of the Research Center for Cell and Tissue Culture, Faculty of Agriculture, Kyoto University. We also thank Prof. Dr M. Kanehisa of Institute of Chemical Research, Kyoto University, for his help with computer analysis.

References Alt J and Herrmann RG (1984) Nucleotide sequence of the gene for preapocytochromefin the spinach plastid chromosome. Curr Genet 8:551-557 Alt J, Morris J, Westhoff P and Herrmann RG (1984) Nucleotide sequence of the clustered genes for the 44 kd chlorophyll a apoprotein and the '32 kd'-like protein of the photosystem II reaction center in the spinach plastid chromosome. Curt Genet 8:597 606 Anderson S, Bankier AT, Barrell BG, de Bruijn MHL, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH, Smith AJH, Staden R and Young IG (1981) Sequence and organization of the human mitochondrial genome. Nature 290:457-465 Bedwell D, Davis G, Gosink M, Post L, Nomura M, Kestler H, Zengel JM and Lindahl L (1985) Nucleotide sequence of the alpha ribosomal protein operon of Escherichia coli. Nucl Acids Res 13:3891-3903 Bennoun P (1982) Evidence for a respiratory chain in the chloroplast. Proc Natl Acad Sci USA 79:4352-4356

20 Bonitz SG, Berlani R, Coruzzi G, Li M, Macino G, Nobrega FG, Nobrega MP, Thalenfeld BE and Tzagoloff A (1980) Codon recognition rules in yeast mitochondria. Proc Natl Acad Sci USA 77:3167-3170 Cerretti DP, Dean D, Davis GR, Bedwell DM and Nomura M (1983) The spc ribosomal protein operon of Escherichia coli: sequence and cotranscription of the ribosomal protein genes and a protein export gene. Nucl Acids Res 11:2599-2616 Chomyn A, Mariottini P, Cleeter MWJ, Ragan CI, Matsuno-Yagi A, Hatefi Y, Doolittle RF and Attardi G (1985) Six unidentified reading frames of human mitochondrial DNA encode components of the respiratory-chain NADH dehydrogenase. Nature 314:592-597 Chomyn A, Cleeter MWJ, Ragan CI, Riley M, Doolittle RF and Attardi G (1986) URF6, last unidentified reading frame of human mtDNA, codes for an NADH dehydrogenase subunit. Science 234:614-618 Cozens AL, Walker JE, Phillips AL, Huttly AK and Gray JC (1986) A sixth subunit of ATP synthase, an F0 component, is encoded in the pea chloroplast genome. EMBO J 5:217-222 Crick FHC (1966) The genetic code: III. Sci Amer 254(4): 55-62. Crouse EJ, Schmitt JM and Bohnert HJ (1985) Chloroplast and cyanobacterial genomes, genes and RNAs: a compilation. Plant Mol Biol Reporter 3:43-89 Deno H and Sugiura M (1983) The nucleotide sequences of tRNAS~r(GCU) and tRNA~n(UUG) genes from tobacco chloroplasts. Nucl Acids Res 11:2185-2192 Dome AM, Lescure AM and Mache R (1984) Site of synthesis of spinach chloroplast ribosomal proteins and formation of incomplete ribosomal particles in isolated chloroplasts. Plant Mol Biol 3:83-90 Ellis RJ (1981) Chloroplast proteins: synthesis, transport and assembly. Ann Rev Plant Physiol 32:111 137 Fish LE, Kuck U and Bogorad L (1985) Two partially homologous adjacent light-inducible maize chloroplast genes encoding polypeptides of the p700 chlorophyll a-protein complex of photosystem I. J Biol Chem 260:1413-1421 Fournier MJ and Ozeki H (1985) Structure and organization of the transfer ribonucleic acid genes of Escherichia coli K-12. Microbiol Rev 49:379-397 Froshauer S and Beckwith J (1984) The nucleotide sequence of the gene for malF protein, an inner membrane component of the maltose transport system of Escherichia coli. J Biol Chem 259:10896-10903 Fukuzawa H, Kohchi T, Shirai H, Ohyama K, Umesono K, Inokuchi H and Ozeki H (1986) Coding sequences for chloroplast ribosomal protein S12 from the liverwort, Marchantia polymorpha, are separated far apart on the different DNA strands. FEBS Lett 198:11-15 Fukuzawa H, Yoshida T, Kohchi T, Okumura T, Sawano Y and Ohyama K (1987) Splicing of group II introns in mRNAs coding cytochrome b6 and subunit IV in the liverwort Marchantia polymorpha chloroplast genome: Exon specifying a region coding for two genes with the spacer region. FEBS Lett 220:61-66 Gilson E, Nikaido H and Hofnung M (1982) Sequence of the malK gene in E. coli K 12. Nucl Acids Res 10:7449-7458 Hausinger RP and Howard JB (1982) The amino acid sequence of the nitrogenase iron protein from Azotobacter vinelandii. J Biol Chem 257:2483-2487 Hayashida N, Matsubayashi T, Shinozaki K, Sugiura M, Inoue K and Hiyama T (1987) The gene for the 9 kd polypeptide, a possible apoprotein for the iron-sulfur centers A and B of the photosystem I complex, in tobacco chloroplast DNA. Curr Genet 12:247-250 Hearst JE, Alberti M and Doolittle RF (1985) A putative nitrogenase reductase gene found in the nucleotide sequences from the photosynthetic gene cluster of R. capsulata. Cell 40: 219 220

21 Heinemeyer W, Alt J and Herrmann RG (1984) Nucleotide sequence of the clustered genes for apocytochrome b6 and subunit 4 of the cytochrome b/f complex in the spinach plastid chromosome. Curr Genet 8:543-549 Henning J and Herrmann RG (1986) Chloroplast ATP synthase of spinach contains nine nonidentical subunit species, six of which are encoded by plastid chromosomes in two operons in a phylogenetically conserved arrangement. Mol Gen Genet 203:117-128 Herrmann RG, Alt J, Schiller B, Widger WR and Cramer WA (1984) Nucleotide sequence of the gene for apocytochrome b-559 on the spinach plastid chromosome: implications for the structure of the membrane protein. FEBS Lett 176:239-244 Higgins CF, Haag PD, Nikaido K, Ardeshir F, Garcia G and Ames GFL (1982) Complete nucleotide sequence and identification of membrane components of the histidine transport operon of S. typhimurium. Nature 298:723 727 Higgins CF, Hiles ID, Whalley K and Jamieson DJ (1985) Nucleotide binding by membrane components of bacterial periplasmic binding protein-dependent transport systems. EMBO J 4:1033-1040 Higgins CF, Hiles ID, Salmond GPC, Gill DR, Downie JA, Evans IJ, Holland IB, Gray L, Buckel SD, Bell AW and Hermodson MA (1986) A family of related ATP-binding subunits coupled to many distinct biological processes in bacteria. Nature 323:448-450 Hirschberg J and Mclntosh L (1983) Molecular basis of herbicide resistance in Amaranthus hybridus. Science 222:1346-1349 Hoj PB, Svendsen I, Scheller HV and Moiler BL (1987) Identification of a chloroplastencoded 9-kDa polypeptide as a 214Fe-4S] protein carrying centers A and B of photosystem I. J Biol Chem 262:12676-12684 Holschuh K, Bottomley W and Whitfeld PR (1984) Structure of the spinach chloroplast genes for the D2 and 44kd reaction-center proteins of photosystem II and for tRNAS~r(UGA). Nucl Acids Res 12:8819-8834 Howard JB, Lorsbach TW, Ghosh D, Melis K and Stout CD (1983) Structure of Azotobacter vinelandii 7Fe ferredoxin. J Biol Chem 258:508-522 Klein P, Kanehisa M and DeLisi C (1985) The detection and classification of membrane-spanning proteins. Biochim Biophys Acta 815:468-476 Maione TE and Gibbs M (1986) Association of the chloroplastic respiratory and photosynthetic electron transport chains of Chlamydomonas reinhardtii with photoreduction and the oxyhydrogen reaction. Plant Physiol 80:364-368 Minami Y, Wakabayashi S, Wada K, Matsubara H, Kerscher L and Oesterhelt D (1985) Amino acid sequence of a ferredoxin from thermoacidophilic Archaebacterium, Sulfolobus acidocaldarius. Presence of an N6-monomethyllysine and phyletic consideration of Archaebacteria. J Biochem 97:745-753 Montandon PE and Stutz E (1983) Nucleotide sequence of a Euglena gracilis chloroplast genome region coding for the elongation factor Tu; evidence for a spliced mRNA. Nucl Acids Res 11:5877-5891 Morris J and Herrmann RG (1984) Nucleotide sequence of the gene for the P6s0chlorophyll a apoprotein of the photosystem II reaction center from spinach. Nucl Acids Res 12: 2837 2850 Oh-oka H, Takahashi Y, Wada K, Matsubara H, Ohyama K and Ozeki H (1987) The 8 kDa polypeptide in photosystem I is a probable candidate of an iron-sulfur center protein coded by the chloroplast genefrxA. FEBS Lett 218:5~54 Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sano T, Sano S, Umesono K, Shiki Y, Takeuchi M, Chang Z, Aota S, Inokuchi H and Ozeki H (1986) Chloroplast gene organization deduced from complete sequence of liverwort Marchantia polymorpha chloroplast DNA. Nature 322:572 574

22 Phillips AL and Gray JC (1984) Location and nucleotide sequence of the gene for the 15.2 kDa polypeptide of the cytochrome b-f complex from pea chloroplasts. Mol Gen Genet 194: 477-484 Quigley F and Weil JH (1985) Organization and sequence of five tRNA genes and of an unidentified reading frame in the wheat chloroplast genome: evidence for gene rearrangements during the evolution of chloroplast genomes. Curr Genet 9:495-503 Schon A, Krupp G, Gough S, Berry-Lowe S, Kannangara CG and Soil D (1986) The RNA required in the first step of chlorophyll biosynthesis is a chloroplast glutamate tRNA. Nature 322:281-284 Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsubayashi T, Zaita N, Chunwongse J, Obokata J, Yamaguchi-Shinozaki K, Ohto C, Torazawa K, Meng BY, Sugita M, Deno H, Kamogashira T, Yamada K, Kusuda J, Takaiwa F, Kato A, Tohdoh N, Shimada H and Sugiura M (1986) The Complete nucleotide sequence of the tobacco chloroplast genomes: its gene organization and expression. EMBO J 5" 2043-2049 Steinback KE, Bonitz S, Amtzen CJ and Bogorad L (1985) (eds) Molecular Biology of the Photosynthetic Apparatus. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY Steinmetz AA, Castroviejo M, Sayre RT and Bogorad L (1986) Protein PSII-G: An additional component of photosystem II identified through its plasmid gene in maize. J Biol Chem 261: 2485-2488 Stern DB, Bang AG and Thompson WF (1986) The watermelon mitochondrial URF-1 gene: evidence for a complex structure. Curr Genet 10:857-869 Sugita M, Shinozaki K and Sugiura M (1985) Tobacco chloroplast tRNALys(uuu) conserved boundary sequence in the intron. Proc Natl Acad Sci USA 82:3557 3561 Wada A and Sako T (1987) Primary structures of and genes for new ribosomal proteins A and B in Escherichia coli. J Biochem 101: 817-820 Westhoff P (1985) Transcription of the gene encoding the 51 kd chlorophyll a-apoprotein of the photosystem II reaction center from spinach. Mol Gen Genet 201:115-123 Westhoff P, Farchaus JW and Herrmann RG (1986) The gene for the Mr 10,000 phosphoprotein associated with photosystem II is part of the psbB operon of the spinach plastid chromosome. Curr Genet 11:165-169 Wittmann HG (1983) Architecture of prokaryotic ribosomes. Ann Rev Biochem 52:35-65 Youvan DC, Bylina EJ, Alberti M, Begusch H and Hearst JE (1984) Nucleotide and deduced polypeptide sequences of the photosynthetic reaction-center, B8970 antenna, and flanking polypeptides from R. capsulata. Cell 37:949-957 Zurawski G, Bohnert HJ, Whitfeld PR and Bottomley W (1982) Nucleotide sequence of the gene for the Mr 32,000 thylakoid membrane protein from Spinacia oleracea and Nicotiana debneyi predicts a totally conserved primary translation product of Mr 38,950. Proc Natl Acad Sci USA 79:7699-7703 Zurawski G and Zurawski SM (1985) Structure of the Escherichia coli S10 ribosomal protein operon. Nucl Acids Res 13:4521-4526

Gene organization and newly identified groups of genes of the chloroplast genome from a liverwort, Marchantia polymorpha.

The complete nucleotide sequence of chloroplast DNA from a liverwort, Marchantia polymorpha has made clear the entire gene organization of the chlorop...
820KB Sizes 0 Downloads 0 Views