Journal of Virological Methods. 38 (1992) 47-60 0 1992 Elsevier Science Publishers B.V. / All rights reserved / 0166-0934/92/$05.00

47

VIRMET 01 330

Rapid sequencing of the Sendai virus 6.8 kb large (L) gene through primer walking with an automated DNA sequencer H. Gieseckea, B. Obermaierb, H. Domdeyb and W.J. Neuberta ‘Max-Planck-Institutftir Biochemie. Abteilung ftir Virusforschung, and bLuboratorium fcr Molekulare Biologie - Genzentrum - der Ludwig-Maximilians-Universitiit Mtinchen, Martinsried, F.R.G. (Accepted

19 December

1991)

Summary

The determination of the complete DNA sequence of the large (L) polymerase gene of Sendai virus strain Fushimi was used to explore the potential and feasibility of primer walking with fluorescent dye-labelled dideoxynucleotide terminators on an automated ABI DNA sequencer. The rapid identification of the complete sequence demonstrated that this approach is a time- and cost-saving alternative to classical sequencing techniques. Analysis of the data revealed that the L gene of Sendai virus strain Fushimi consists of exactly 6800 nucleotides and that the deduced amino acid sequence identifies a single open reading frame encoding a protein of 252.876 kDa. In contrast to Sendai virus strain Enders, the L mRNA of strain Fushimi is monocistronic. The comparison of the deduced amino acid sequences of the L genes of three different Sendai virus strains confirmed the existence of conserved as well as variable regions in the L protein and revealed a high grade of conservation in the carboxyterminal third. Furthermore, functional amino acid sequence motifs, like elements of RNA-dependent RNA polymerases and ATP-binding sites as postulated previously, were identified. Sendai virus; L gene; Primer walking; Automated DNA sequencing; Fluorescent dyeterminators

Correspondence to: W.J. Neubert, Max-Planck-Institut 8033 Martins&d, FRG.

ftir Biochemie, Abteilung fiir Virusforschung, D-

48

Introduction Sendai virus is a non-segmented, negative-strand RNA virus belonging to the family of Paramyxoviridae. In the transcriptive and replicative complex, i.e. the viral nucleocapsid, the RNA genome is associated with NP and P proteins and with catalytic amounts of large (L) proteins. With an estimated molecular weight of 253 kDa, the large size of the L protein indicates that it possesses most of the enzymatic activities required for initiation, elongation, and termination during both, transcription and replication, as well as for methylation, capping, and polyadenylation of viral mRNAs. Furthermore, Einberger et al. (1990) demonstrated that Sendai virus L protein is provided with a protein kinase activity phosphorylating the other proteins of the viral nucleocapsid in vitro. The nucleotide sequences and deduced amino acid sequences of the L genes of Sendai virus strains Z and Enders have been known since 1986 (Shioda et al., 1986; Morgan and Rakestraw, 1986). The main difference concerned the 5’ terminal nucleotide sequence questioning also the coding capacity of the respective transcripts. In order to establish the exact nucleotide sequence and to obtain more information about the structure of L genes and their gene products in Sendai viruses, the complete L gene of another Sendai virus strain, named Fushimi, was sequenced. From the numerous new techniques developed for rapid DNA sequencing, those which include the application of fluorescent dyes in combination with automated DNA sequencers (Smith et al., 1985; Ansorge et al., 1986; Prober et al., 1987) have accelerated the sequencing procedure considerably. Therefore, the delay in rapid DNA sequence determination was changed to the steps occurring before the actual sequencing reactions, i.e. mainly to the processes involved in converting large DNAs into shorter, sequencable fragments (subcloning) and to the isolation and preparation of the DNA to be sequenced. The primer walking strategy, where primers synthesized according to available sequence data are used for stepwise determination of a longer sequence context, is one way to circumvent the time-consuming steps of DNA subcloning and sample preparation. Regarding the original protocol (Smith et al., 1985) of the Applied Biosystems DNA sequencing system, Kaiser et al. (1989) demonstrated that sequencing by primer walking is not feasible with primers carrying different fluorescent dyes. Recently, a modification of this sequencing protocol was introduced which now allows the use of unlabelled primers in combination with four different fluorescein-labelled nucleotide terminators (Schmidt, 1990). This protocol is similar to the sequencing method developed by Prober et al. (1987). We decided therefore to examine this primer walking approach for the rapid determination of the complete nucleotide sequence of the Fushimi strain L gene.

49

Materials and Methods Virus and RNA

The Fushimi strain of Sendai virus was obtained from the American Type Culture Collection (Rockville, MD) (ATCC-No. VR-105, Sendai/52). Stock virus was grown in 11-day-old chicken embryos at 33°C and subsequently purified as described previously (Neubert and Hofschneider, 1983). Total cellular RNA was prepared by the guanidinium-isothiocyanate lysis procedure (Chirgwin et al., 1979). cDNA cloning

The genomic cDNA of the L gene of Sendai virus strain Fushimi was generated by reverse transcription using three different primers corresponding to the nucleotide positions 8509-8531, 1045610474, and 13393-13411 (all position numbers according to Shioda et al., 1986). The second strand was synthesized by polymerase chain reaction (PCR) using the oligonucleotides 10798-10779, 13537-13517, and 15383-15363 in the three reactions, respectively. Thus, three different PCR-fragments of the 6800 bp L gene were obtained (Fig. 1 panel II; the nucleotide position 1 of the L gene corresponds to position 8527 of the Sendai genome according to Shioda et al., 1986), ligated into the plasmid vector pBluescript II SK (Stratagene) and subsequently assembled to a single clone. For the sequencing reactions, the recombinant plasmid containing the full-length L gene was prepared by the alkaline lysis procedure and purified by differential precipitation with polyethylene glycol according to Sambrook et al. (1989).

I

III

5’

1

(8527)

->

->

->->->->

t-t

t

c----c-t -

->

-> t

6

5

->-> ->

->

4

3

2

->

->

-)

->

-> ->

c-

kb

-> ->

t-

3,

-> t

t-t -

Fig. 1. Cloning and sequencing strategies of the Sendai virus L gene. I: Scale in kilobases; according to Shioda et al. (1986) the L gene starts at nucleotide position 8527 of the Sendai genome. II: Clones A, B, and C represent three different PCR-fragments which were assembled to the full-length L cDNA in subsequent cloning steps. III: Positions of 35 oligonucleotides used as primers for sequencing the L gene. Arrows indicate the directions of the respective sequencing reactions.

50

Synthetic

oligonucleotides

The synthesis of L gene-specific oligodeoxyribonucleotides was performed on an Applied Biosystems Model 380B DNA Synthesizer using phosphoramidite chemistry. Selection of the primer sequences followed the L gene data published for Sendai virus strain Z by Shioda et al. (1986). Thus, 35 oligomers were prepared for sequencing the entire L gene. The location of the primers and the directions of the resulting sequencing reactions is demonstrated in Fig. 1, panel III. Sequencing

reactions

Sequencing primer)

with jluorescein-labelled

dye-primers

(MI3

Universal and reverse

Four sequencing reactions were carried out in 5 ~1 (A and C) or 10 ~1 (G and T) reaction mixtures. For the A and C reactions, the mixtures contained 250 ng plasmid DNA, 0.4 pmol dye-labelled primer (supplied by Applied Biosystems), 1 ~1 d/ddNTP mix, 1 ~1 5 x cycle sequencing buffer (400 mM Tris-HCl, 100 mM ammonium sulfate, 25 mM MgC12), and 0.57 U Taq-polymerase. For the G and T reactions, the mixtures contained 500 ng plasmid DNA, 0.8 pmol dyelabelled primer, 2 ~1 d/ddNTP mix, 2 ,ul 5 x cycle sequencing buffer, and 1.14 U Taq-polymerase. In the deoxy/dideoxy termination mixes, the following concentrations were used according to the manufacturer’s protocol: d/ddATP mix (1.5 mM ddATP, 62.5 PM dATP, 250 PM dCTP, 375 PM c’dGTP, 250 ,uM dTTP); d/ddCTP mix (750 PM ddCTP, 250 ,uM dATP, 62.5 ,uM dCTP, 375 PM c’dGTP, 250 PM dTTP); d/ddGTP mix (125 PM ddGTP, 250 PM dATP, 250 PM dCTP, 94 ,uM c’dGTP, 250 PM dTTP), d/ddT mix (1.25 mM ddTTP, 250 PM dATP, 250 PM dCTP, 375 PM c’dGTP, 62.5 PM dTTP). The reactions were overlaid with 20 ,ul light mineral oil. Cycle conditions in a Perkin Elmer Therm0 Cycler were: 15 cycles with 30 s at 95”C, 30 s at 55°C and 1 min at 70°C followed by 15 cycles with 30 s at 95°C and 1 min at 70°C. The four reactions were combined in one tube, EtOH-precipitated, and dissolved in 6 ~1 deionized formamide/lO mM EDTA (pH 8.0). Before loading on a 6.5% acrylamide/8 M urea sequencing gel, the sample was heat-denatured at 90°C for 2 min. Sequencing 1.6 pg

reactions with dye-labelled

terminators

plasmid DNA in 8.75 ,ul Hz0 and 4 ~1 primer (0.8 pmol/,ul) were added to 7.5 ~1 prereaction mixture containing 5 x TACS buffer (400 mM Tris-HCl, 10 mM MgC12, 100 mM ammonium sulfate, pH 9.0), 750 PM dITP, 150 FM dATP, 150 PM dTTP, 150 PM dCTP, G, T, A and C DyeDeoxyTerminators (Applied Biosystems), and 2 U Taq-polymerase. To prevent evaporation 20 ~1 mineral oil was overlaid. Twenty-five cycles were performed with 1 s at 98°C 15 s at 50°C and 4 min at 60°C. Excess dideoxynucleotide terminators were removed by centrifugation through Sephadex G50 spin

51

columns (DuPont NEK 529). The precipitated DNA was dissolved and heat denatured as described above. Sequencing and analysis of data was carried out on the Applied Biosystems automated DNA sequencer 373A. Computer analysis

Secondary structure predictions and analysis and comparisons of L gene sequences were performed on a Micro Vax II computer using the programs of Devereux et al. (1984). Results

One of the first questions to be answered in this sequencing project was, if the data obtained from the two different sequencing protocols, in which either dye-labelled primers or dye-labelled terminators were used, were both of similar high quality. In general, more nucleotides (average of 450) could be resolved

I

I

100

I20

160

IlO

I

140

140

I 160

lk0

Fig. 2. Comparison between the data outputs of the sequencing reaction with fluorescein-labelled dyeprimers (a) and fluorescein-labelled dye-terminators (b). The indicated numbers (100 to 160) of these primer reactions correspond with the L gene nucleotide positions 16 to 76.

52

per sequencing reaction with the fluorescein-labelled primers than with dyelabelled terminators (average of 350400 nucleotides), however, the DNA sequence data from both protocols yielded the same extremely low amounts of unidentified bases up to nucleotides 350-400. Although the signal intensities of the peaks corresponding to the dideoxynucleotide-terminated DNA fragments were less uniform when the dye-terminator protocol was used, this did not result in a higher number of ambiguities of the final sequence data outputs. In Fig. 2, processed DNA sequence data outputs are shown for comparison. Therefore, there was no principal caveat for using the primer walking approach. Since the DNA sequence of the L gene of Sendai virus strain Z had previously been established by Shioda et al. (1986), primer sequences for the 6.8 kb L gene of Sendai virus strain Fushimi could be selected in distances of about 300-350 nucleotides from each other along this homologue sequence. Thus, 35 oligomers with lengths between 16 and 24 nucleotides were used for complete sequencing of the 6.8 kb insert. Most of the sequence was determined by analysis of both strands (Fig. 1, panel III). In all cases, primers were effective and none required redesigning due to problems resulting e.g. from multipriming. Nucleotide sequence and coding potential

of the L gene

The complete nucleotide sequence of the L gene of Sendai virus strain Fushimi was determined and deposited in the EMBL-, GenBank, and DDBJ Nucleotide Sequence Databases under the accession number X58886. The L mRNA, representing almost half of the entire genome, spans exactly 6800 nucleotides excluding the poly(A) tail. It contains a single continuous open reading frame starting at the first AUG at position 29 and terminating at a UAA stopcodon at position 6713, backed up by a second in-frame and immediately following UAA termination codon. In contrast to the L gene sequence of Sendai virus strain Enders (Morgan and Rakestraw, 1986) our results identified a monocistronic L mRNA as claimed for Sendai strain Z (Shioda et al., 1986) as well as for other L genes of paramyxoviruses (Galinski et al., 1988; Blumberg et al., 1988). In agreement with other Sendai genes (Shioda et al., 1983,1986; Hidaka et al., 1984), the consensus sequences 5’ AGGGUGAAUG 3’ (a) and 5’ AGUAAGAAAAA 3’ (b) were found at the 5’- and 3’-termini, respectively, of the L mRNA. The complementary orientations of these consensus stretches represent the signals for initiation (a) and termination (b) of transcription as well as for polyadenylation (b) of the mRNA by the viral RNA-dependent RNA polymerase. Furthermore, a second open reading frame in the Sendai L gene could be identified in the genome sense-strand starting at nucleotide position 1062 and ending at position 313. The corresponding peptide chain would consist of 249 amino acids, which is longer than that of the Sendai virus C’ nonstructural

53

protein. Moreover, this is the only long open reading frame in the Sendai minus-strand genomic RNA capable of coding for more than 135 amino acids. This frame, however, is not flanked by the above-mentioned consensus sequences found in other Sendai genes, but its AUG startcodon is in a favoured sequence context (AxxATGG) for initiation of eukaryotic translation, by having an A-residue in the - 3 position and a G-residue in the +4 position according to Kozak (1986). In spite of the production of antibodies against the peptide chain corresponding to the open reading frame, the existence of this hypothetical reverse peptide in infected cells, however, could not be proved so far. Primary structure and functional

motifs of the L protein

The long open reading frame of the Sendai Fushimi L gene contains 2228 codons and could code for a large protein with a calculated M, of 252,876 kDa, which agrees with results obtained from Western blot staining with monoclonal antibodies against the L protein (Einberger et al., 1990). Analysis of the deduced amino acid sequence of the L gene product suggested more than 1.6 times higher content of leucine and isoleucine and, by the same factor, a lower content of alanine compared to an ‘average’ protein proclaimed by Dayhoff et al. (1978). Kamer and Argos (1984) examined a number of viral RNA-dependent RNA polymerases and identified a conserved 16residue motif consisting of a central Asp-Asp dipeptide (D-D; usually preceded by a glycine), surrounded by hydrophobic residues. This motif is proposed to represent the key functional domain for template recognition and/or phosphodiester bond formation in viral transcriptases. Structurally-related motifs of detectable homology to the conserved sequence, such as SDD (position 1430) NDDD (position 1466), and LDD (position 1649), are also present in the Sendai virus L protein and mark its hypothetical active sites. The pentapeptide QGDNQ constitutes the longest continuous stretch of invariant amino acids in the comparison of six polymerases of unsegmented negative-strand RNA viruses and may be considered to be another variant of the conserved D-D motif (Barik et al., 1990). This pentapeptide, also present in the Sendai Fushimi L protein in an identical amino acid order, is located at position 77 1. Two consensus ATP-binding sites have been identified in the Sendai L protein on the basis of their similarity to a motif found in protein kinases (Kamps et al., 1984). At least three essential activities encoded by the L protein must involve binding and utilization of ATP: (i) binding of substrate rNTPs during polymerization, (ii) polyadenylation, and (iii) protein kinase activity responsible for specific phosphorylation of the NP and P proteins (Einberger et al., 1990). These ATP-binding sites have been shown to consist of a glycine-rich motif in combination with a lysine within a distance of 16-28 residues. In Sendai L protein, this conserved motif was found at position 1733 in the form

54

KxsKxdGxzGxzGxG (x represents any amino acid) and a second time at position 1782 with KxziGxGxG. Comparison of the L genes of different Sendai virus strains

In order to locate functionally homologous or strain-specific domains in the L protein, we have aligned and compared the sequences of the L genes of three different Sendai virus strains. The changes in nucleotide sequences and corresponding exchanges in amino acid sequences are listed in Table 1. The nucleotide sequence comparisons between the Fushimi and Z strains did not show any insertions or deletions. 99.4% of the nucleotides and the same percentage of amino acids were found to be conserved in these two strains. Fourty-one nucleotide mutations (interestingly, almost half being C/T transitions or vice versa) were identified, and 13 of them led to an amino acid exchange. When comparing the Fushimi strain L gene sequence with that of the Enders strain, two single-nucleotide deletions (at positions 374 and 385), as well as two single-nucleotide insertions (positions 85 and 613), were found. Fig. 2 actually shows this part of the sequence, including the point of insertion at nucleotide position 85 (corresponding to position 170 in this primer reaction). On the other hand, the nucleotide and deduced amino acid sequences were 99.8% homologous between these two strains. Only 13 base mutations, resulting in 5 amino acid exchanges, discriminate the Fushimi strain L gene from its counterpart in the Enders strain and, therefore, these two strains seem to be more closely related than the Fushimi to the Z strain, at least with regards to L sequence data. Thus, the Fushimi L peptide sequence differs from the other two Sendai strains examined in 2 amino acid positions (2081 and 2206, Table l), therefore, these mutations exclusively occur in the Fushimi strain L protein. Functional domains of the L protein

The differences in the L amino acid sequence between the three different Sendai virus strains Fushimi, Enders, and Z are not symmetrically distributed over the whole length of the peptide; the carboxyterminal third is conserved. Over a range of 794 amino acids, from position 1286 to 2080, all nucleotide exchanges concerned only the wobble base (Table 1 and Fig. 3). Such rather high conservation of amino acid sequence in this range of Sendai L proteins indicates an important functional domain. This suggestion is strongly supported by the fact, that the above mentioned consensus motifs for RNA polymerases and ATP-binding sites are located within this postulated domain. Poch et al. (1990) as well as Barik et al. (1990) postulated an L protein structure consisting of concatenated functional domains characterized by distinct blocks of conserved amino acid sequences and separated by variable regions. Both reports are based on L protein sequence alignments of several different, nonsegmented negative-strand RNA viruses. Supporting the postulated L

55 TABLE 1 Changes in the Sendai virus L gene Nucleotide Pos. 84 218 224 293 371 383 491 599 611 612 758 800 1004 1166 1631 1763 1769 1787 1901 1904 1946 2120 2123 2423 2426 2582 2996 3014 3143 3497 3533 3647 3881 4145 4292 4343 4610 4634 4769 5240 5321 5819 5957 6242 6269 6503 6644 6683 6776

changes F

AK CCT CAG GAT GCG CGG AGC CCA CTC C#T GAG AGG AGC GCT AAG CTG AGC AAA AAA AAG AGG CCA GTC GAA ACC AGG AGC TCA GGC % AGT TCC ;g AGC CTA TCC GAC CGC TAC GAT CTT TGT CAG GAG AGC CTG ATC

Amino acid differences E

Z

*c* *** *** **A **# *#* *** *** *** *A* *** *** ***

***

**c *** *** *** *** *** *** *** **G A** *** *** *** *** C** *** *** *** *** c** **A *** *** *** *** *** *** *** *** *** *** A+* *** **G *** ***

**c **A **A *** *** G** **G **T *** **A *A* **T **c G** **A **G **G G** **T **A *** *** **G *T* *A’ **T *** +*A c** *T* T** *** **A **c **T T** **T **T **T **T **c **c **c A** **A **G **A G**

F

E

Z

89

D

§

E

155

s

5

G

258

R

R

K

535

K

K

E

581

s

s

R

625 626

K K

:

E N

699

V

I

V

800 852

T R

T R

I K

996

s

P

S

1169 1207 1285

T s S

T S P

k S

2081

Q

K

K

2206

S

R

R

Pos.

F = Fushimi strain; E = Enders strain; Z = Z strain; * = sequence identical with F; # = missing nucleotides; 8 = depending on frameshifts in the nucleotide sequence of Enders strain L gene, the open reading frame (and therefore also the comparison between F and E peptide sequences) starts with amino acid 196.

56

I

Al A2 Bl

1 I

II

III

A

B

500

IV

C

1000

VI

V

D

1500

2000

2228 i

B2 83

t-

Fig. 3. A: Approximate positions of conserved regions in L proteins of nonsegmented negative-strand RNA viruses according to (Al): Poch et al. (1990) and (A2): Barik et al. (1990). B: Amino acid changes between the L proteins of different Sendai virus strains: Fushimi/Enders (B2) and Fushimi/Z (B3). Scale in amino acids (Bl); exact exchange positions are listed in Table I.

protein structure, the amino acids, which were different in the L gene of the Fushimi strain in comparison to the two other strains, seem to occur in clusters and are preferentially located outside of the conserved domains identified by Barik et al. (1990) however with a slight restriction to Block D (Fig. 3). An extensive hydrophilic domain in the Sendai Fushimi L protein, spanning nearly 100 amino acids, was found in the region of the conserved Block B or Box III described by Barik et al. (1990) and Poch et al. (1990), respectively (data not shown). This large-scale hydrophilic area indicates a potential site for enzymatic activity, in accordance with the high grade of conservation.

Discussion DNA sequence data of related viral genomes can provide important information about functionally homologous proteins and can lead to better understanding of the molecular biology of viral genes. Therefore, DNA sequence analysis has become an important tool, not only in structural, but also in functional characterization of different viral genes. On the other hand, DNA sequencing is a slow and costly approach when the classical methods of DNA sequence analysis are applied. Introduction of semiautomated and, therefore, more rapid DNA sequencing systems make it feasible to sequence large fragments of DNA in less time. Due to extensive moditications of the original sequencing protocols by Smith et al. (1985) which introduced the use of dye-labelled terminators instead of dye-labelled primers (Schmidt, 1990), the ABI DNA sequencing system, offers the possibility of applying a primer walking strategy for sequence analysis of large DNA fragments. Although labelling of different walking primers for the ABI DNA sequencing system had been possible, Kaiser et al. (1989) demonstrated that most of these primers

yielded only unsatisfactory results primarily due to different migration behaviour causing mobility shifts that could not be cured by the installed software programs. In this sequencing project we showed that the availability of dye-labelled terminators for a DNA sequencing reaction has now made it possible to use almost any primer. Moreover, the application of labelled terminators prevents signal artifacts due to spontaneous stopping of the polymerization reaction at secondary structure impediments. The observed reduced resolution of the dye-terminator reactions probably results especially from the observed less uniform signal intensities, a general phenomenon in the data output of these sequencing reactions. However, this did not lead to false nucleotide analysis or interpretation in the range of up to 35&400 bases. The major advantage of the primer walking approach is that time-consuming subcloning steps are not necessary. If, an homologous sequence is known, and if all primers are immediately available, two gels would be (in principle) sufficient to sequence both strands of a gene with a length comparable to that of the Sendai virus L gene, The primer walking strategy should therefore be applicable to similar projects. We also describe the nucleotide sequence of the L gene of Sendai virus strain Fushimi, deduce the primary structure of the Fushimi strain L protein, and compare it to the deduced proteins of the two other known Sendai L sequences, those of Enders and Z strain. These data show that the overall mutation rate of the L protein gene is significantly lower than the exchange frequency of other genes between different Sendai strains (Neubert et al., 1991), indicating its high conservation and functional importance. As previously published for the L gene of Sendai virus strain Z (Shioda et al., 1986) and other Paramyxoviridae (Stec et al., 1991; Kawano et al., 1991), our results identify a monocistronic L mRNA for Fushimi strain. In contrast to our data, Morgan and Rakestraw (1986) postulated three open reading frames located on the L mRNA of Sendai virus strain Enders. Sequence comparisons of the L genes of Fushimi and Enders strains showed two deletions and two insertions resulting in shifts in the reading frame. As a consequence, the deduced amino acid sequence of the Enders strain L protein lacks 180 amino acids at the amino-terminal end when compared to the Fushimi strain L protein, which confirms the data obtained for Z strain by Shioda et al. (1986). Besides the major open reading frame present on the mRNA sense-strand and encoding 2228 amino acids, another additional shorter reading frame was identified, which is located on the viral genome sense-strand and capable of coding for a peptide of 249 amino acids, what proved to be 8 times longer than the average length of open reading frames on the Sendai minus-sense genomic RNA. Interestingly, on Influenza B viral RNA segment 1 (DeBorde et al., 1988) and on Influenza C segment 2 (both genes are encoding parts of the viral polymerase; Yamashita et al., 1989), two open reading frames exist on the minus-sense genome strand, generating two hypothetical peptides of comparable lengths as identified in the L gene of Sendai virus. The Influenza B open

reading frame contains 252 codons, whereas the reverse peptide of Influenza C is composed of 180 amino acids. Both frames are located at approximately the same positions in the polymerase genes as in the case of Sendai virus L protein, but no significant homology or similar amino acid compositions between the three proteins are detectable. Proving the existence of these reverse peptides would signify for the first time a viral minus-strand RNA coding capacity for para- and orthomyxoviruses. Analysis of the deduced amino acid sequence of the Fushimi strain L protein suggested the existence of several conserved motifs of RNA-dependent RNA polymerases. According to Poch et al. (1989), an enzymatic active domain may be present in the range of amino acids 660 to 850 containing four different polymerase elements, among these the invariant pentapeptide QGDNQ is included. Furthermore, a second functional domain might exist between amino acids 1430 and 1650, where three different versions of the conserved polymerase motif Asp-Asp are located (Kamer and Argos, 1984). In this hypothetical domain no amino acid exchanges were observed in comparison to the L proteins of the three different Sendai virus strains. In addition to the polymerase motifs, the Fushimi L protein contains two glycine-rich sequences which conform to a consensus ATP-binding motif. The L protein activities which would utilize ATP are the polymerization-related functions as well as the protein kinase activity (Einberger et al., 1990; Kondo et al., 1990). The potential ATP-binding motifs of the Fushimi L gene product differ from the consensus GxGx2Gx16_2sK (Kamps et al., 1984) in respect to the position of the lysine and the exact order of the glycine residues, but nevertheless, it still might encode enzymatic activity. The functional motifs described above will represent useful targets for sitedirected mutagenesis experiments to study the different enzymatic functions of the L protein.

Acknowledgements The authors wish to thank C. Schommer, J. Kramer and C. Baumann for excellent technical assistance and Dr. J. Gulizia for reading the manuscript. This study was supported by a grant from the Deutsche Forschungsgemeinschaft, Schwerpunkt ‘Virulenzfaktoren und Wirtstropismus animaler Viren’, by a grant from the Bundesministerium fur Forschung und Technologie and by donations from Hoechst AG and Wacker GmbH. References Ansorge, W., Sproat, B., Stegemann, J. and Schwager, C. (1986) A non-radioactive automated method for DNA sequence determination. J. Biochem. Biophys. Methods 13, 315-323. Batik, S., Rud, E.W., Luk, D., Banerjee, A.K. and Kang, C.Y. (1990) Nucleotide sequence analysis of the L gene of Vesicular Stomatitis Virus (New Jersey serotype): identification of conserved

59 domains in L proteins of nonsegmented negative-strand RNA viruses. Virology 175, 332-337. Blumberg, B.M., Crowley, J.C., Silverman, J.I., Menonna, J., Cook, SD. and Dowling, P.C. (1988) Measles virus L protein evidences elements of ancestral RNA polymerase. Virology 164, 487497. Chirgwin, J.M., Przybyla, A.E., McDonald, R.J. and Rutter, W.J. (1979) Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochemistry 18, 52945299. Dayhoff, M.O., Hunt, L.T. and Hurst-Caldersone, S. (1978) Composition of proteins. In: M.O. Dayhoff (Ed.), Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 3. Nat]. Biomed. Res. Found., Washington, D.C. pp. 363-373. DeBorde, D.C., Donabedian, A.M., Herlocher, M.L., Naeve, C.W. and Maassab, H.F. (1988) Sequence comparison of wild-type and cold-adapted B/Ann Arbor/l/66 influenza virus genes. Virology 163, 429443. Devereux, J., Haeberli, P. and Smithies, 0. (1984) A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12, 387-395. Einberger, H., Mertz, R., Hofschneider, P.H. and Neubert, W.J. (1990) Purification, renaturation and reconstituted protein kinase activity of the Sendai virus large (L) Protein: L protein phosphorylates the NP and P proteins in vitro. J. Virol. 64, 42744280. Galinski, MS., Mink, M.A. and Pons, M.W. (1988) Molecular cloning and sequence analysis of the human parainfluenza 3 virus gene encoding the L protein. Virology 165, 499-510. Hidaka. Y., Kanda, T., Iwasaki, K., Nomoto, A., Shioda, T. and Shibuta, H. (1984) Nucleotide sequence of a Sendai virus genome region covering the entire M gene and the 3’ proximal 1013 nucleotides of the F gene. Nucleic Acids Res. 12, 7965-7972. Kaiser, R., MacKellar, S.L., Vinayak, R.S., Sanders, J.Z., Saavedra, R.A. and Hood, L.E. (1989) Specific-primer-directed DNA sequencing using automated fluorescence detection. Nucleic Acids Res. 17, 6087-6102. Kamer, G. and Argos, P. (1984) Primary structural comparison of RNA-dependent polymerases from plant, animal and bacterial viruses. Nucleic Acids Res. 12, 726997282. Kamps, M.P., Taylor, S.S. and Shefton, B.M. (1984) Direct evidence that oncogenic tyrosine kinase and cyclic AMP-dependent protein kinase have homologous ATP-binding sites. Nature 310, 589-592. Kawano, M., Okamoto, K., Bando, H., Kondo, K., Tsurudome, M., Komada, H., Nishio, M. and Ito, Y. (1991) Characterizations of the human parainfluenza type 2 virus gene encoding the L protein and the intergenic sequences. Nucleic Acids Res. 19, 2739-2746. Kondo, K., Bando, H., Tsurudome, M., Kawano, M., Nishio, M. and Ito, Y. (1990) Sequence analysis of the phosphoprotein (P) genes of human parainfluenza type 4A and 4B viruses and RNA editing at transcript of the P genes: the number of G residues added is imprecise. J. Viral. 178, 321-326. Kozak, M. (1986) Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 2833292. Morgan, E.M. and Rakestraw, K.M. (1986) Sequence of the Sendai virus L gene: open reading frames upstream of the main coding region suggest that the gene may be polycistronic. Virology 154, 3140. Neubert, W.J. and Hofschneider, P.H. (1983) Transient rescue of Sendai-6/94 c1 virus from the persistently infected cell line CI-E-8 by cocultivation. Virology 125, 445453. Neubert, W.J., Eckerskorn, C. and Homann, H. (1991) Sendai virus NP gene codes for a 524 amino acid NP protein. Virus Genes 5:1, 25-32. Poch, O., Sauvaget, I., Delarue, M. and Tordo, N. (1989) Identification of four conserved motifs among the RNA-dependent polymerase encoding elements. EMBO J. 8, 3867-3874. Poch, O., Blumberg, B.M., Bougueleret, L. and Tordo, N. (1990) Sequence comparison of five polymerases (L proteins) of unsegmented negative-strand RNA viruses: theoretical assignment of functional domains. J. Gen. Virol. 71, 1153-l 162. Prober, J.M., Trainor, G.L., Dam, R.J., Hobbs, F.W., Robertson, C.W., Zagursky, R.J., Cocuzza, A.J., Jensen, M.A. and Baumeister, K. (1987) A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238, 336341.

60 Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, New York. Schmidt, G. (1990) Spaziergang durch das Genom. BioTec 9/90, 33-35. Shioda, T., Hidaka, Y., Kanda, T., Shibuta, H., Nomoto, A. and Iwasaki, K. (1983) Sequence of 3,687 nucleotides from the 3’ end of Sendai virus genome RNA and the predicted amino acid sequences of viral NP, P and C proteins. Nucleic Acids Res. 11, 7317-7333. Shioda, T., Iwasaki, K. and Shibuta, H. (1986) Determination of the complete nucleotide sequence of the Sendai virus genome RNA and the predicted amino acid sequences of the F, HN and L proteins. Nucleic Acids Res. 14, 1545-l 563. Smith, L.M., Fung, S., Hunkapiller, M.W., Hunkapiller, T.J. and Hood, L.E. (1985) The synthesis of oligonucleotides containing an aliphatic amino group at the 5’ terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis. Nucleic Acids Res. 13,2399-2408. Stec, D.S., Hill III, M.G. and Collins, P.L. (1991) Sequence analysis of the polymerase L gene of Human Respiratory Syncytial virus and predicted phylogeny of nonsegmented negative-strand viruses. Virology 183, 273-287. Yamashita, M., Krystal, M. and Palese, P. (1989) Comparison of the three large polymerase proteins of influenza A, B, and C viruses. Virology 171, 458466.

Rapid sequencing of the Sendai virus 6.8 kb large (L) gene through primer walking with an automated DNA sequencer.

The determination of the complete DNA sequence of the large (L) polymerase gene of Sendai virus strain Fushimi was used to explore the potential and f...
977KB Sizes 0 Downloads 0 Views