Vol. 65, No. 6

JOURNAL OF VIROLOGY, June 1991, p. 2910-2920

0022-538X/91/062910-11$02.00/0 Copyright ©) 1991, American Society for Microbiology

Equine Arteritis Virus Is Not a Togavirus but Belongs to the Coronaviruslike Superfamily JOHAN A. DEN BOON,1'2 ERIC J. SNIJDER,l* EWAN D. CHIRNSIDE,2t ANTOINE A. F. DE VRIES,2 MARIAN C. HORZINEK,2 AND WILLY J. M. SPAAN'

Department of Virology, Institute of Medical Microbiology, Faculty of Medicine, Leiden University, Postbus 320, 2300 AH Leiden,' and Department of Virology, Institute of Infectious Diseases and Immunology, Veterinary Faculty, University of Utrecht, Utrecht,2 The Netherlands Received 13 November 1990/Accepted 20 February 1991

The nucleotide sequence of the genome of equine arteritis virus (EAV) was determined from a set of overlapping cDNA clones and was found to contain eight open reading frames (ORFs). ORFs 2 through 7 are expressed from six 3'-coterminal subgenomic mRNAs, which are transcribed from the 3'-terminal quarter of the viral genome. A number of these ORFs are predicted to encode structural EAV proteins. The organization and expression of the 3' part of the EAV genome are remarkably similar to those of coronaviruses and toroviruses. The 5'-terminal three-quarters of the genome contain the putative EAV polymerase gene, which also shares a number of features with the corresponding gene of corona- and toroviruses. The gene contains two large ORFs, ORFla and ORFlb, with an overlap region of 19 nucleotides. The presence of a "shifty" heptanucleotide sequence in this region and a downstream RNA pseudoknot structure indicate that ORFlb is probably expressed by ribosomal frameshifting. The frameshift-directing potential of the ORF1a/ORF1b overlap region was demonstrated by using a reporter gene. Moreover, the predicted ORFlb product was found to contain four domains which have been identified in the same relative positions in coronavirus and torovirus ORFlb products. The sequences of the EAV and coronavirus ORFla proteins were found to be much more diverged. The EAV ORFla product contains a putative trypsinlike serine protease motif. Our data indicate that EAV, presently considered a togavirus, is evolutionarily related to viruses from the coronaviruslike superfamily.

from genome length (13 kb) to 0.8 kb (8, 48, 49). Viral subgenomic RNAs (sgRNAs) are composed of leader and body sequences which are not contiguous on the EAV genome. The sgRNAs may be produced by alternative splicing, since their leader sequence is derived from the extreme 5' end of the EAV genome (8). In order to study the genome organization, replication strategy, and polypeptide composition of EAV, we have determined its genomic sequence. Our data reveal an ancestral relationship between EAV and the coronaviruses and toroviruses, two virus groups for which we recently proposed a superfamily status (40). Interestingly, coronaviruses and toroviruses display similar genome sizes and virion architecture which are completely different from those of the arteriviruses.

Equine arteritis virus (EAV) was first isolated from a fetus aborted during an endemic disease outbreak in pregnant mares (9). Serological evidence suggests that the virus is widespread in the horse population and only rarely causes disease. If it does, the clinical features are acute anorexia and fever, usually accompanied by palpebral edema, conjunctivitis, nasal catarrh, and edema of the legs, genitals, and abdomen. In infected pregnant mares, abortion is common (10, 15). A carrier state exists in seropositive stallions, in which EAV is produced in semen (47). These "shedding stallions" may consequently infect brood mares by a venereal route. Field isolates are rare and may be difficult to propagate in cell culture, and consequently the biology of EAV is poorly understood. EAV has been classified as a member of the togavirus family (34, 53). The spherical enveloped EAV particle has a diameter of 50 to 70 nm (25). It consists of an icosahedral core structure of 35 nm (23) surrounded by an envelope carrying ringlike structures with a diameter of 12 to 15 nm (30). The viral genome is a single-stranded RNA of positive polarity with an estimated Mr of 4,000,000 (51). Four or five virion proteins have been described: a nucleocapsid (N) protein of 12 kDa, an unglycosylated 18-kDa envelope protein, a 21-kDa envelope protein, and one or two larger envelope proteins which, probably due to glycosylation, give rise to heterogeneous material of between 28 and 40 kDa (8a, 25, 54). During EAV replication, a 3'-coterminal nested set of seven virus-specific RNAs are produced, ranging in size

MATERIALS AND METHODS cDNA clones and nucleotide sequence analysis. Generation of EAV cDNA, mapping of clones, subcloning of restriction fragments in M13 vectors, and DNA sequence analysis were performed as described previously (8). Computer analysis of sequence data. Nucleotide sequence data were assembled and analyzed with the computer software designed by Staden (43). Amino acid sequence similarity searches were carried out with the FASTA program (31) and the NBRF protein identification resource (release 22.0). Dot matrix comparisons, sequence alignments, and polypeptide structure analysis were carried out by using the COMPARE, GAP, and PEPTIDE STRUCTURE options from the software provided by the Computer Genetics Group/University of Wisconsin (version 5, 1989 [7]). Sequence alignments were optimized by visual inspection.

Corresponding author. t Present address: Equine Virology Unit, The Animal Health Trust, Lanwades Park, Kennett, Suffolk, England CB8 7PN. *

2910

VOL. 65, 1991

Construction of pFSEAV and pFSEAV8. Construct pFSEAV (see Fig. 4A) was similar to vector pBSFS, which had been developed to test ribosomal frameshifting during Berne virus polymerase gene expression (40). An intermediary construct from the pBSFS construction, named pBSMBB (40), was used to create pFSEAV. Vector pBSMBB contained a copy of the mouse hepatitis virus (MHV) A59 membrane (M) protein gene into which BglII and BamHI linkers had been inserted. Following digestion of pBSMBB with BglII and BamHI, pFSEAV was generated by cloning a 425-bp Sau3A-Sau3A fragment (nucleotide [nt] positions 5368 to 5793 in Fig. 2) from EAV clone 579 into pBSMBB. This 425-bp fragment contained the EAV polymerase open reading frame la (ORFla)/ORF1b overlap region (see Fig. 4A). Construct pFSEAV8 was obtained by deleting the region between the SmaI (nt position 5512 in Fig. 2) and PvuII (nt position 5788 in Fig. 2) sites from the EAV ORFlb sequence in pFSEAV. A HindlIl site, which is present downstream of the hybrid gene in the pBS multiple cloning region, was used for this purpose (see Fig. 4A). pFSEAV was digested with PvuII and HindIlI, and the restriction fragment which contained the EAV ORFlb/MHV M (C terminus) junction was purified. This fragment was recloned into SmaI- and HindlIldigested pFSEAV, resulting in a 276-bp deletion (nt positions 5515 and 5790 in Fig. 2), which leaves the EAV ORFlb reading frame intact. The orientations of inserts and nucleotide sequences at the MHV M/EAV polymerase junctions were examined by sequence analysis. The hybrid genes in pFSEAV and pFSEAV8 were under the control of the pBS T7 promoter. In vivo expression of pFSEAV and pFSEAV8. pFSEAV and pFSEAV5 were expressed in vivo by transfection of HeLa cells. The HeLa cells had previously been infected with vaccinia virus recombinant vTF7-3, which contains the T7 polymerase gene under the control of a vaccinia virus promoter (11). Infections, transfections, and metabolic labeling were performed as described previously (40). Immunoprecipitation of expression products with antisera directed against the N and C termini of the MHV M protein were carried out as described before (40). In order to reduce aspecific precipitation, 0.2% sodium dodecyl sulfate (SDS) (instead of 0.1%) was used in immunoprecipitations with the N-terminal antiserum. Nucleotide sequence accession number. The sequence reported here has been assigned EMBL accession number X53459. RESULTS AND DISCUSSION Nucleotide sequence analysis of the EAV genome. The synthesis of a genomic EAV cDNA library and the mapping of clones which cover all but the 18 nucleotides at the 5' end of the EAV genome have been described recently (8). By similar methods, additional cDNA clones were obtained which allowed sequence analysis of the EAV genome on at least two independent cDNA clones (Fig. 1). However, nt 19 to 355 were present only in cDNA clone 586. This clone contained the EAV genomic leader sequence, which is adjoined to the 5' end of all EAV RNA species. Partial sequence analysis of clone 586 has been reported previously

(8).

The consensus nucleotide sequence of the EAV genome (12.7 kb) and the deduced amino acid sequences of EAV proteins are presented in Fig. 2. The primer extension experiments which were used to determine the length of the

NUCLEOTIDE SEQUENCE OF EAV GENOME 625 452 659671 584- 106 423 -..105 483-. 572 545 685- 694 567- 108 550 427 579 -- -...- 535 696-

586 673

-

0

2

4

8

6

10

A..AAAA AAAAAA

-

.AAAAAA

12 kb AA.

5' i-

1.. m

2911

1a

M m | 61 b l lb

131

3'

ORFs R RNA AAAAAA

1

E-

AAAAAA AAAAA E-AAAAAA in-AAAAAA EAAAAAA E-AAAAAA

2

E-A

3 4 5 6 7

FIG. 1. Organization of the EAV genome. The cDNA clones which were used in the sequence analysis of the EAV genome are indicated in the upper part of the figure. Dotted lines indicate the parts which were sequenced. The lower part of the figure shows the positions of the ORFs (based on the sequence) and the positions of the sgRNAs (based on RNA lengths estimated from the gel; see text). The solid box indicates the EAV leader sequence.

EAV leader sequence (8) were repeated with an oligonucleotide which is complementary to positions 81 to 100 in the sequence presented in Fig. 2 (data not shown). The EAV leader sequence was calculated to be 207 nt long (1 nt shorter than reported previously [8]), indicating that the sequence in Fig. 2 lacks only the 18 most-5'-terminal nucleotides of the EAV genome. Experiments to obtain this sequence are in progress. EAV genome structure and expression. Eight open reading frames (ORFs) were identified in the EAV genome (Fig. 1 and 2). We assume that ORFla and ORFlb, which take up about 75% (9.7 kb) of the genome, are translated from the genomic RNA and encode the EAV polymerase (POL) or replicase (see below). The other six ORFs, which are partially overlapping, are thought to be expressed from six subgenomic mRNAs generated in EAV-infected cells. The bodies of these 3'-coterminal mRNAs are homologous to the 3' end of the genome (8). In addition, they contain a common leader sequence at their 5' end (8) (Fig. 1). We have reestimated the lengths of the EAV sgRNAs (reported earlier by van Berlo et al. [48]) on the basis of the agarose gels presented by de Vries et al. (8). Sizes of 3.2, 2.7, 2.2, 1.9, 1.2, and 0.8 kb [including a poly(A) tail] were now calculated for EAV mRNAs 2, 3, 4, 5, 6, and 7, respectively. Some characteristics of and our proposed nomenclature for the EAV RNAs and ORFs are listed in Table 1. The sequence 5'-UCAAC-3' has been identified as the site where the EAV leader sequence is joined to the body of sgRNAs 6 and 7 (8). As a result, ORFs 6 and 7 are located in the "unique" 5'-terminal regions of these RNAs. The EAV genome was found to contain 18 UCAAC motifs (indicated in Fig. 2), and all ORFs (except ORFlb) are preceded by one or more of these motifs. The sizes of the bodies of EAV sgRNAs 2 through 5 closely correspond to the positions of UCAAC junction motifs and ORFs 2 through 5 on the genome (Fig. 1 and 2). Many of the EAV ORFs are preceded by multiple UCAAC motifs, but only one of these potential junction sites seems to be used in sgRNA generation. For example, ORF7

_ mCMA~GT6 CAGMTU6CAGT MCAmGUT

OTGCCATATACGGCTCACCACCATATACACTGCAArAATTACTATTCTTGTGGGCCCCTCTCGGTMATCCTAGAGGGCMCCTCTCGTTATTGCGAGATTCGTcGTTAGATAACGGCAAGMCCTTTCTTACTATCCTATTTrCATCTTGTGGCTTGACG H A T F S'A T G F G G S F V R'D W S'L D L P D A C'E H G'A G L C C E V D G S T L C A E C F' 181 GGTCACTGCCATCGTCGTCGATCTCTATCAACTACCCTTGCGACTATGGCAACCTTCTCCGCTACTGGAM66AGGGA6 IGGTTAGGGATGGTCCCTGGAMTACCCGACGCTTGTGAGCATGGCGCGGGATTGTGCTGCGAAGTGGAC6GCCCACCTTATGCGCCGA6GTGTM

1

46 R G C 361 CGCGGTTG

'E

G N E Q C P G L F 'M G L I K L A' S P V 'P V G H K F L' I G W 'Y R A A K V T G R Y N F L E L L Q H P A F A Q L R V V GTGGGCMTGTCCTGGCTTGTTCATGGGACTGTTAAT GGCTTCGCCAGTTCCAGTGGGACATAAGTTCCTGATTGGTTGGTATCGAGCTGCCAMGTCACCGGGCGTTACAAMCCTTGAGCTGTTGCMCACCCTGCMCGCCCAGCTGCGTGTGGTT

106

'L

A I E E A S V F I S T D H A S A' K R F 'P G A R F A L' T P -V 'Y A N A W V V' S P A 'A N S L I V T T D Q E Q D 6 F C W' CACGCGTCTGCTAA GCGMMCCTGGCGCTA6 AMGCGCTGACACC GGTGTATGCTAACGCTTGGGTTGTGAGCCCGGCTGCTAACAGMTGATAGTGACCACTGACCAGACAAT6GGTTCTGCTGG

D A R

541 GATGCTAG;6TTAGCCATTGATAGAGGCAATTM MACTGAC

166 721

L K L L P P D R R E' A G L 'R L Y Y N H Y' R E Q 'R T G i L S K' T G L 'R L W L G D L' G L G 'I N A S S G G6 L K F 'H I N R G S P' 6TMAAMTCACCT6 MCCGTGAGGCTGGMCGTTGC6TTATTACA ACCATTACGAAAGACC6GC6TGGCTGTCTACGATCGCTTATGGCTTGGAGACCTGG GTTTGGGACTGGTTGGAGGGCTGAMTTCCACATTATAGTCGCCT

226 Q R A W H I T T R S' C K L 'K S Y Y V C D' I S E 'A D W S C L P' A G N 'Y G G Y N P P' G D G 'A C G Y R C L' A F N HN G A -T V V S' 901 CA6CCGAGMGGCATATCACAAGACGAGCTGAAGAGCTACTACGTT 1GTGACATTCTGACACTGG6TCCTGMG 1CCTGCTGGCAACTACGGCGGCTACMATCACCAGGGGAGGAGCTTGCGGTTACAGGTGCTTGGCCTTCATGAATGGCGCCACTGTTGTGTCG

286 A G C 'S S D LW C D' D E L 'A Y R V F Q L' S P T 'F T V T I P G' G R V 'C P N A K Y A' M I C 'D K Q H W R V' K R A 'K G V a L C L' 1081 GC TTGlT6666TATAn GCTTATCGAGTC MCAATTGTCACCCA CGTTCACGGTTACCATCCCA^GGTGGGCGAGMGTCCGAATGCCAAGTACGCAATGAMGTACGCGCCGGCGC GTCAAACGT TGCMAGGGCGTCGGCCTGTGTCTC

346 D E S C F R G I C N C Q R N S G P P P A P V S A A V L D H I L E A A T F a N V R V V T P E G Q P R P V P A P R V R P S A 1261 GAGACG CAG CGATTMCCTATGGACCACCACCTGCACCCGTGTCAGCCGCCGT6TnAGATCKACAATGGCGCGACGMGG6CAACGTTCGCGTGGTTACACCTGM4GGCAGCCACCCMGTACUCAGCCCGCGAGTTCGTCCCAGC6CC 06 N S S G D V K D P A' P V P P V P K P R T K L A T P N P T Q A P I P A P R T R L Q G A S T Q E P L A S A G V A S D S A P K 1441 AACTCTYCT6GAGATGTCAAAGATMGGCGCCCGTTCCG CCAG TACCAAAACCAAGGAM AAGCTTG CCACACCG AACCCAACTCAGGCGCCATCCAGCACCGCGCACGCGCAGGCTAAAGG TGGCGCGTGCAGGAGTCTTGACTCGGCACCTAMA 466 W R V A K T V Y S S A E R F R T E L V Q R A R S V 6 D V L V Q A L P L K i P A V Q R Y T H T i K H H R S R F S W H C D V 1621 TGGCGTTGGMMACTG6TGTCGs GGGMGACAAC66TGTc6 aTGGTA6TCAACGTCTCGT GGACTTTTTc6TM cCAAGGCTACCGTCAAACCCACGTCGGTATAcA TCCTGAACTTAAGATGATGCTrCACGMcAGTTGGCACTGCGACGTG 526 W Y P L A V I A C L L P I W P S 1 A L L L S F A I G I I P S V G N N V V L T A L L V S S A N Y V A S H D H Q C E a A A C 1801 TGGTACCCTTTGGCTGTAATCGCTTGTTGCTCCCATATGCATCTCGCMGCTACTTTCcAGCTGCCTTGGT6ATACCCA6TTGTGGGCMT6TG6TTGTT6CAGcTGCT66CTGMATCAGCTATTATGTTTGCTCAATGGACCATCAATGTGAAGGT6CGGCTTGC

586 L A L L E E E H Y Y R A V R W R P I T 6 A L S L V L N L L G Q V G Y V A R S T F D A A Y V P C T V F D L C S F A I L Y L 1981 TTAGCCTTCGAG ACCATAAACGCGTGGM_=iGATTAA G6c6cTG6GTTGTCTCM A6TGGAGAGCTATGTAGCTCGTTCCACCTTTATGCAGCTTATGTTCC T Gc6TTcATCTTTGCAGCMGCTATTCTGTACCTC 646 C R N R C W R C F G R C V R V G P A T H V L G S T G Q R V S K L A L I D L C D H F S K P T I D V V G N A T 6 W S G C Y T 2161 TGCGCMCGTGCGGGATCTCGGC 6T11TCAGTGCCGCCCATG CCACCGGGCAACGAGMCCA AACTGGCGCTCATTGAMGTGTGC)CCACTTCAAAGCCCACCACGATGTTTCMCA6T6CA6CTGTGGGGATGTTACACA 706 G T A A N E H Q C A S T V *D P H S F D Q K K A 6 A T V Y L T P P V *N S G s A L Q C L N *V H W K R P G S T *V L G E Q I G 2341 GGAACCGCCCATGGAGOTcAcTcTGCTTc=6ACGGTGGAcCcTCACGT6TCGCAAAGCAuGcGzTAGcGcACTG CCcTTCAMCCTGTCAACGGGTCAGCGCTGATCTMTGCTTGGAGGACCTTGTCCAGXTTGGGACAAAG 766 A V V T A V K S I S F S P *P C C V S T T L P T *R P G V T V V D H A L Y N H L T A S G V D P A I L R V G Q G D F L K L N P 2521 GCTGT6TGTACGGCGGTCAAGATATCTCTCGWCAM TCCC C C ACMTGCCCCACMGAMCGGTTGAMGTTGCCATGCCM ACAACCGGTTGACTGCTTCAGG6GTCGATCCCGCMATTGCGTTGTFGGCAAGTGATTTCTAAAACTTMTCCG 826 G F R L I G a W I Y G I C Y F V L V V V S T F T C L P I K C G I G T R D P F C R R V F S V .P V T K T Q E H C H A a H C A 2701 666,TTCc66TGATAGTGGTGAATGGGATATGTATT66TG6TTGG GGTGCAAcrMAMcTG ACCTATCMATGTW6ATTGGCACC6CGGAcCcmTTTCGMGAGAGTTM||CTGTACCCGTCACAGMAAGAG4&CACGCCATGCTGGAATGTGTGCT 886 S A E G I S L D S L G L T Q L Q s Y W I A A V T S G L V I L L V C H R L A I S A L D L L T L A S P L V L L V F P W A S V TTGTCTGCCAC CATCAGCGCCTT CACATCTT6AGTATC CGMcTGGC GGACTT GTTGACTCTAGCTTCcCcMAT | |GC | | TGTTCCCTTGGGCATCTGTG 2881 AGGTAGCTTTTG TTTGGTATATTACMA6TTACT66AkTCGCAGCCGT 946 C L L L A C L A 6 A A V K I Q L I A T L F V *N L F F P Q A T L V T N G V W A C V A A L A V Y S L M G L R V K V N V P M 3061 GGG6CTM| ACTTGCT lT6AAI ITTI ITIGTGAATcCTTTCTMCAACTACTTGTCCAGGATACTGCGCGTGGCGGCMGGCCGACGcMGAGGCTTGCGATGAAAGTGAATGTGCCCATG

1006 C V T P A H F L L L A R S A G Q s R E Q N L R V S A A A P T N S L L 6 V A R D C Y V T G T T R L Y I P K E G G N V F E 6 3241 TGTGTGACCCTGC TGGCTCGTGATTGTTATGTCCAGGCACATCGCTAATACCCAAGGAA6GGCGGATGGTGMGAAGGG 1066 L F R S P K AR G NI V G F V A G ss Y G T G S V W T RN N E V V V L T A S H V V G R A N M A T L K I G D A N L AT L T F K ETCTATGANAILKACMCD MLTLTFK 3421 CTAHr NVwFA6TSTC_CSVHCACICR 1126 K N G D F A E A V T T Q S E L P 6 N H P Q L H F A Q P T T 6 P A S W C T A T G D E E G L L S a E V C L A W T T s a D S G

3601 AAGAATGGCGACTTCGCCGAGGCAGTGACGACACAGTCCGAGCTCCCAGGCMATT6CACAGTTGATTTCGCCCAACCAACAACCGGGCCCGCTTCATGGTGCACTGCCACAGGAGATGAAGAAGGCTTGCTCAGTGGCGAGGMGTCTGGCGT66GCACTACTATGGCGACTCTGGA 1186 S A V V Q G 0 A V V G V N T G S N T s G V A Y V T T P S 6 K L L G A D T V T L S S L S K H F T G P L T S I P K D I P 0 N 3781 TCTGCA6T6GTTCAGGT6ACGCTGTGGTAGGGGXCACACACGGTTCGAACACAAGTG6TGTTGCCTACGTGACCAcCCCAAGCGGMAAMCTCCTTGGCGCCGACACCGTGACMGTCATCACTGTCAAAGCAMCACAGGcCCCMGACAATcCAACAAGGACATCCCTGACAAC 1246 I I A D V D A V P R S L A N I I 0 G L S N R E S S L S G P Q L L L I A C F H W S Y L N Q P A Y L P Y V L G F F A A N F F 3961 ATTATTGCCGATGTTGATGCTGTTCCTCGTTCTCTGGCCATGCTGATTGATGGCTTATCCAATGGGG_CTTT CTGGACCTCAGTTGTTGTTMTTGCTTGTMATGTGGTCTTATCTTAACCAACCTGCTTACTTGCCTTATGTGCTGGGCTTCTrTGCCGCTAACTTCTTC

1306 4141

L P K S V G R P V

V'

T G L

'L

W L C C L F T P L S M R L C L F H L V C A T V T G N V I s L W F Y I T A A G T S Y L S E M W

CTGCCAMATGTGCCGCCCTGTGGTCACTGGGCTTCTATGGTTGTGCTGCClTMCACACCGCCMATGCGCTTCTTGTGTTGTTCCATCTGGCTTGCTACCGTCACG6GGMACGTGATATCTTTGTGGTTCTACATCACTGCCGCTGGCACGTCTACCTTTCTGAGATGTGG

1366 F G G Y P T N L F V P R F L V V Q F P 6 W A I 6 T V L A V C S I T M L A A A L G H T L L L 0 V F S A S G R F D R i F N N 4321 TTCGAGCTATCCCACATGTTTTTGGCACGGTTCCTAGTGTACCAGTTCCCCGGCTGGGCTATTGGCA6ATACTAGCGGTATGCAGCATCACCATGCTGGCTGCTGMCCTCGGTCACACCCTGTTACTGGATGTGTTCTCCGCCTCAGGTC6CM6AGG-ACTTTCATGATG 1426 K Y F L E G 6 V K E S V T A S V T R A Y G K P I T Q E S L T A T L *A A L i D D D F Q F L S D V L D C R A V R S A M N L C 4501 AAATACTTMTGGGGGATMGTTCACCGCCTCAGTCACCC GCGCTTATGGCAACCMATTAC MAGAATCTCACTGCAACATTAGCTGCCCTCACTGAT GATGACTTCCAATTCCTC TCTGAT GTGCTT GACT GTCGGGCC GTCCGATCGGCAATGAATCTCGGT A 1486 A A L *T S F Q V A Q Y R N I L N A S L Q V D R D A A H S R R L N A K L A 0 F A V E Q E *V T A a 0 R V V V I 0D G L 0 R GTAGTCGCAGACTMT GGCMAACTG6TTGCGTMCGTT GACMAATAACAGCTGGAGAC CGTGTTGTGGTTATCGACGGTCT6 GACCGCATGGCT 4681 GCCGCrCTCACAATCM6CAATGCCAGTATCGTAACATCCMA6TGCATM6TGCACTCACr6CTGCTU 1546 H F K 0D D L V L V P L T T K V V C G S R C T I C D V V K E E A N D T P V K P N P S R R R R K C L P K G A Q L E W 0 R H Q AGAGACMTGACACCCCAGTTGC AAGGT;c AG6TAGGCGGC CAGTCCCMTGACGTC GTTW&zC AAT6GCCMTCCGAGAAGCCAAG GGCCTGCCTAAA GGTGCTCAGTTGGA GTGGGACCGTCACCAG 4861 CACTlTCAAGAGTTTGCGTCMGACCACCA 1606 E E K R N A 6D D D 0 F A V S N D V V K R V P K Y H D P S D T R G T T V K I A G T T Y Q K V V 0 Y S G N V H *Y V E H Q E D 5041

i

E S K K A K A H V A H A N P V E A V N Q L 0 Y V I C K C S 1666 5221 CTGCTAGACTACGTGCTGCAAGGVGGACAGAAGCCTAAGATCACAAACATGCTTAAA GTTGCTCATCT GCTGTTG GATCTGGCT AACCCAGTTGAGGCAGTGMATCAG GTGGACCCCAC GGAGCTCTCCTCC AAGAAAGCAAGGCGC GTCAC 1726 L N L R A P H I F P G D V G R R T F A D s K D K G F V A L H S R T N F L A A R D F L F N I K F V C D E E F T K T P K D T 1 TTAACGAGGCCCCACTCMCCGCATGTGGGGCGTCGGA£CCmG GACTCTAGACAAGMCGGCTACAGTcCGACAATGT1TMAGCTGCCCGGGATTMAMMCATCAAAMGTGTGCGACGAAGAGTTCACAGCCAAAAA 5401 V R A' c P G 'Y W F I F R R T H R S L I D A Y W D S N E C V Y A L P' T I S 'D F D V S P G' D V A 'V T G E R W D' F E S 59 L L G GC GTACGMATTCCGTCGTACGCAOCC66TCSCTGITGTCTCGGCGAGATGCGMACGCGCTTCCCACCATATCTGATTTTATGTGAGCCCAGGTGACGTCGCAGTGACGGGCGAGCGATGGGAmTTTTGMTC 5581 _GiI R A K R L T A D 1 V H A F Q G F *H G A S Y S Y D D K V A A A V S C D P Y R s D G V L Y N T R W G N I P Y S V P T 119 P G G CC_ lTTClUUCGGAGCTCT6GCTTATM6TTCCTATG6ACGACnACAGGTGGCAGCTGCTGTCAGTG6TGACCCGTATCGGTCGGACGGCGTCTTGTATMACACCCGTTGG6CAACATTCCATATTCTGTCCCAACCA 5761 A T A C Y RHA G C E A V T D G T N V I A T I G P F P *E Q Q P I P D I P K S V L 0 N C A D I S C D A F I A P A A E 179 N A I 5941 ATGCTTTGGAGCCACGCTTGTACCGTCTGATGTGAGMTAcMGACGACCACTTGACATTGGCCCTCGACAACCATACCGGACATCCCAAAGAGCGTGCTTGACAACTGCGCTGACATCAGCTGTGACGCTTCATAGCGCCCGCTcGCAGAGl 239 T A L c C D L E K Y N L S i Q G F V L P S V F S N V R A Y L *K E E i G 0 A P P L *Y L P S T V P S K N S Q A a I N G A E F 6121 CAGCCCTGTGGAGTrAGAATACACTMACG CAGGGT MGGTGTGGTTmTC TCCATGGTGCGG6CGTACTTMGAGAACTCTGAGA CACCATCTACTTCACATGTACCATCTAMMTITCGACAM GTTMCG6CGCTGAGTTC 299 P T K S L Q S Y C L I D 0 N V S Q S N K S N L Q T A T N A T *C K R Q Y C S K Y K I R S I L G T N N Y I G L a L R A C L S 6301 CTCACC AAAC lKCTGMGAllTTGTACATGTGCAATCCATGMGMTTAC AAAMGCCACCATGGCGACTTGTAAACGGCMACGTMATACAAGTTAGGAGCATTCTACATT66 TTCAGCCTAGGTTCT6CGCTGCCTGTTTUGG TAll P I YV L 359 6 V T A A F Q K A 6 K D 6 K S K F D P I P A P D K Y C IE T D L E S C D H S T P A L V R W F A l N L I F E L

6481

0 P I T S I S N T I Y S L V L VY T Q H N L L C L NV L* C C H D L V V A C S V A F T K R G 6 L S S FKV10CTACGT1FTGAATTGCT0TCFGVTCTVGHVTVGIGVG5A0T0TA GCVTVCICPCANACGCQTATCTHAGVCTTCCATTTC ATACCATCTATCATTGHHVTACACCCAGCACATGITICTATGTGCF 479 E G Y F P E I A E K Y L D 6 S L E L R D N F K Y V R V Y I Y S D D V V L T T P N Q H Y A A S F D R W V P H L Q A L L G F 6841 AAGCATT6UClCCAARTTCGAMACTGTGACTGaTGGGCTTCATACGTTC6AGT6TACATCTACTCGGACGATGTGGTTCTAACCtACACCCACCACTTACGCGCACTlTGAMG6CTGGGTCCCCCACCTGUSCAGCCTGCTAGGTTTCA 419 IA

P E L V- H S Y

6661 FCVFPFI A

FIG. 2. Nucleotide sequence analysis of the EAV genome. The first 18 nt of the genome, which remain to be determined, are indicated by N. The deduced amino acid sequences of EAV proteins are shown. Potential N-glycosylation sites in the products of ORFs 2 through 5 are indicated by triangles. Potential TCAAC (UCAAC) leader-body junction sites in the sequence are underlined (- --). 2912

~~

NUCLEOTIDE SEQUENCE OF EAV GENOME

VOL. 65, 1991

2913

539 K V D P K K T V N T S S P S F L G C R F K Q V D G K C Y L A S L Q D R V T R S L L Y H I G A K H P S E Y Y E A A V S I F 7021 AGGTTGACCCMAMAACTGTGCA CAGCTCCCCTTCCTTMGGGCTGCCGCTTCAAGCAAGTGGACGGCAAGI;TGTTTTGCTCTAGCGGTCCCCCTTATACCACATTG6T C AAAGT TCMCAUGAACTTAGCGTTTTCCATCMnA 599 K D S I I C C D E D W W T D L HN R R I S G A A R T D G V E F P T I E H L T S F R T K Q Y E S A V C T V C G A A P V A K S 7201 AGGACTCCATTATCTGCTGTGATGMG6ACTGGTGGACGGACCTCCATCGACGTATCAGTGGCGCTGCGCGTACCGAC(CGGAGTGGTTMMCCACCATTGAMTGTTAACATCCTTCCGCCAGATATGAGAGTGMG6TGTGCACAG GiGCGCCC6 = 6CC UCACTG 659 A C G G W F C G N C V P Y H A G H C H T T S L F A N C G H D I H Y R S T Y C T H C E G S P K Q N V P K V P H P I L D H L 7381 CTTGTGGAGGGTGTTCTGTGGCAATTGTGTCCCGTACCACGCGGGTCATTGTCACACMCCTCGCTCTCGCCAAZ CTGCGGGCACGACATCATGTACCGCTCCACTTACGACA6AMTGTGTGAGGGTTCMCAAACAGATGGTAbCCAATCCTCAaCCWCGATWTGACAMGC 719 L C H I D Y G'S K E'E L T L V V A'D G R T T S P P G R I 7561 TGTGCCACATTGATTACGCAGTAAAGAGAACTAACTCTGGTAGTGGCGGATGGTCGAACAATCACCGCCxG

K V G H K V V A V V A D V G G H I V F G C G6P G S H I A V P L Q

779 D T L K G V V V N K'A L K N A A A'S E. Y'V E G P P G S G K T F H L V K D V L A V 6VG S A T L V V P T H A S H L D C I H K 7741 ATACGCTCAAGC-GCGTGGTGGTGAATMAGCTCTGAAMACGCCGCCGCCTCTGAGTACMGTGGAAGACCTGGC GTGGGAGcTnTCACCTG TCAAAAGGTACGTGG gcT6TCGGAGGCAMCMT6 T T6 C 839 L K Q A G A D P Y F V V P K Y T V L D F P R P 6 S G H I T V R L P Q V G T S E G E T F V D E V A Y F S P V D L A R I L T 7921 TCAAACAAGCGGCCCGATCCATACTTTGTGGTGCCCAA6TATACAGTTCTTGACTTTCMCGG6CCTGGCA6T66A L VPATCACAGTGCGACTGCCACAGLGCG VTCGGGVCLAGACACA 899 Q C R V K C Y G D L N Q L 6 C V G P A S V P R H I H L R H F V S L E P L R V C N'R F G A V C D L I K G I P Y Y E P A' 8101 AGGTCGA6TCAAGGGTTACGGTGTAAAATCAGCTCGGGTGCGTCGGACCCGCGCTGCCACTAACCTTTGIGCTCCGACATTTTGTCAGCCTGACCCT§ Z6CACGTC6TT A 959 P H T i K V V F V P'N P D F E K G V V I'T A Y H K D R 6 L G H R T i D S I Q C C T F P V V T L R L P T P Q S L T R P R A 8281 CACATACCKTAAA6TGGTG1TTGTGCCMATCCACTTTGAAAGGTGTATCATCACCGCCTACCACAAT TCGCGG TCACCGCACAA MTTCAAGCTGTACTTCCCTG CTCTTCG 1019 V V A V T R A S Q E L Y I Y D P F D Q L S C L L K F T K E A E A Q D L I H P P T A C L V S i E G L E Y Y' I Q 8461 TCTGGcCGGTTACTAGGGCGTCTCAGGAATTATACATCTACCMCCIMGATCAGCMAGCGGGTTGTTAAGTTC riACCAAGAGAAGCCAGCTGTCCATG6CCCACCTAC kd;CATGCCACCTGGGCCAAGAMTIGCPCTI T.n8TMCMTGGGCcCTCGAATATTACA 1079 K E V H L L Y T H V P I K D G V I H S Y P N C 6 P A C W E*K Q S H K I S C L P R V A N L Y H Y'S P P C P I F 8641 AGGAAGTCAACCTGCTGTACACACACGTCCCCATCAA66ATGGTGTMTACACATTACCCTMTTGTG1GCCCTGCCciCTGTGCIGGCAAMGCATCAAICA~AMATTcCGTGCCTCCCGAGNRK67GCACAMAA1TTT0GCTAICCACTAITTI GACTTTTAC 1139 P K E L A E H W P V V S N 0 R Y P N C L Q I T L Q Q V C E L'.S K P C S A G Y N V 6 Q S S i W L T E N V' F V V T1 Q T P 8821 CAAAGACTCGCTGAGCATTGGCCC6TAGTGTCCAATGATAGATACCCGMTTGCTTGCAAATTACCTTACGA AlkGTTGGCTAGTAACCG C A VCA EISK PA SAHCINVC' KUMATCTG1ITTrC CGA~NCFGCCTGGGT GMWCATPCT9ACTGPCTTCTAATGGGTCG CAAGCCGGGAAcIAGTMTGCCGCGCACTCC8~ATAGMTTGTTT 1199 D G K A R A L P D S'L F S S G R F E T N'S R A F L D *K F A H P H C L i N K S T V CG*C S H f I F S Q Y L 9001 ACGGCAAGCGCCTGCTCIACCAIGCCTTCM TTCTCGCCGTAGTTCGAGACTAACAGCCTICCITCCA TC Kd

C6

C.

6

A'A

kcCGATTCAAGCTTCCAGGTATGG K A A C S V V D V Y A P SF E P Y L H P I D FK P C R L N' 66CTG 6CT6CT a UGACATCFGIC1CTACAAATTAIC GAI 1319 V W R H A T F Y V Q E C V D A V T S A L A A V S K L I K V P A N P V S F H V A S A Y Q A K I S I

1259 P P L L P A D A V A L V G A S L A 6 K 9181 CACCATTGCTACVCAVACGCT 6CT6TCCCTGGTTCATT

T.

C6

9361 TGTGGAGCGCCCMTMATGTCCAAGAGTGTTGAT GCAGTTACACCACTCCTA co:ATCAAAGTGMGGMCMCTTCGM CG T;TIAI CCI Am 1379 A A E i A L S T E P P P A 6 Y A I'V R R r I V KR L L S S T E V F LC RR6VVS S 9541 CCGCCGATGCGGCACTGTCAACTAGCCACCTGCTGGTTATGCGATCGCTGCGCGATATATGTMAGAGCTCaXTCGCTCAACAAAGITGCITTCTG&CCGAT6T61TCICI1~

T

i

V

Q

T'I

C

A

L

A

E G

W ;

1439 L Q I G S V I G P V'* V E I L N Q R F S F S C H L 9721 TAc AmTTGGTrATcA6TUTGGccTGTGATGGGT6AGTGTGTcAcTGAmAmTCTATAACTAmC TTATTCTGATMGYACAW6CCTATrCTATrT Ii CAGTCT TGCAn ATTGGlCTGTrTG 26 S A A V H R G V H E V R V 10 L F K D L Q C D N L R A K D A F P S L G Y A L S I 6 Q S R L S Y N L 9901 CGGC 86 E V N P S H I H P H P G L T P D C F D H L E S S S Y A P F I * A Y R Q A I L S Q Y P Q E L Q L E A I

MIF98MCTA

E

K P L FHFCT N f'

A.,CFTGTAAAC C F

Q D H

FiS

i

L A A H R K

I I jI_ c6Abi

10081

AATTATGCCTCMMTATCATGCCTATCMGGTCTACTCCTGATTGC

146 A L Y H H Y H L 1

10261 CATTGTATCATMTTACCATC

A&L T

T G P A T H V V P T V G Q L H Y Y A S S S I F A S S V E V L A A 14 G R A Y S G P V A L L C F F L Y F C F I C G s V G S ACC AATGGGTCG

G CAAC

ill

CI

CGSLILP' C nTTGTTGAT iG GII CII 6G A6T

L

N C K

I

LI A V V A P

L L F A C I P L V T R V Y I S T T I C N H T T S D T S V H L F

206 F T R U6 Y A

10441

LAN S P S R R T S S G T L P R R K I L A,&V T F P S H F Q R H F A A A Q D F V V H I G Y E Y A G V T N L V H L F A N L V L T F P S L V,&C S R P V H V F A TTACGCGGCTMTGTCACCTTCMGCGCrACTTCCAGCGGCACTTTGCCGCGGCGCAAGATTTrGTAGTCCCGTTATGAATATGCc666CACTATG6T AGTGCAcGTT6TGCCAACn=nCTGcTACATsTTCCGAGCTAAn6TTGCGM ;crAATGTCnTT6C

&S

106 & A S C V Q *V V C S H T T T *G L Q L S F S F V *D E D L R L H I R P *T L I C H F A L L L *V H F ; P N P R C R C S 1 N K I Y G C I S G L L L F V C L P C C W C T F Y P C H A A E A RA 10621 TAATGCnTTMGCGTGCAAGTGGMGTAGTCATACCAACTCAACTACTGGCTTGGGTCAACTTCTTTTmCCMGTAGATGAAGATCTACGGCTCTAnTCAGGMTACT1ATTG GG TGM TGCATTaAMTGCA_ GAGGCTTT 34 F T Y I S H G L G H V H G H E G C R H F I V i H S A F L Y L H P T T P T A P A I T H CL L L V L A *A K N E H P A T I 10801 TTACTTACATTAGTCATGGATTGGGCCACTGCAGTCATGG TA TM CACTCTTCTGT MTCMACCAC CACTG CGG6CTATAAC I*u6CTACTATCT 94 W L Q L Q P F G Y H V A C D V I V N L E E D K R H P Y F K L L R A P A L P L C F V A I V Y V L L R L V R H A Q R C Y L * 1 NL S NI

&

10981

_ 66CGCAGCTGCAGCCGMGGGTATCATGTGGCTGGC6PTTATTGTCAAC1 CTTGG_-GAGGAWTTGC6ACTCTTAC1

TMACACTGGTACTTGCACATACTAT GAT

i,&c

6 V L L F L L W G A P S H A Y F S Y Y T A Q R F T D F T L C N L T D R G V I A N L L R Y D E H T A L S A S K T C W Y 11161 TGTATTGC GCI TGGGTGCGCCATCACATGCTTIACTTCTCATACTACAMCCACGUClTTCACAG CTT 6TTGTTATGCTGAC C W66 C T FL D E Q I I i F G TOD C DD T Y iA V P V A E VL E Q A H G P Y S A L F 0D N P P F I Y Y G R E F G I'V V L D V F 11341 TTGCACATTCCTGGACAcMACAA TAG GACGA T TTGAc G6McAnTGACACC cTACGTCMAGTGCTcAGGTCCTGGAACAGCGCrATMX TAAccTGCGuGT;GTGAT TGM6oCTM ||AMACTATGGMG6TGAATTCGGCATAGnGTcTGGATGTGMAT 126 F Y P V L V L F F LS V L P Y A T L I L E N C V S I L F I I Y GI Y S G A Y L A GIf F A A ST L A H S I V V L R Q LL 11521 GTTCTAT ccGTTTTA TGTTTTTCTTATCAGTACTACTATGCTACGCTTATCTTG TGTTAT 186 W L C L A W R Y R E T L H'A S F I S A E G K V Y P V D P G L P ,V A' A V G HN R L LV P G R P T 'I D Y AV A YGcs K 'V H LV 11701 GTGGTTATGCCTGGCTTGGCG A TACGCTTACGGCCTTTAT ATC A6CTACCC66 TTGCC 246 R L C A A E V W E P * I N G A I D S F C G D G I L G E Y L D Y F I L S V P L L L L L T R Y V A S 6GL V Y V L T A L F Y S F V L A A 11881 GAGGTTGGGCGCGGTAT GGGAGCCATAGATTCATTTTGT=lilTGAGATMAGTGAGTACTATATTACAT6TCCTCCGT TM611I =I I 54 Y I W F V I V G R A F S T'A Y A F V L L'A A F L L L V N R N I V G 'M N P R L R S' I F N 'H R Q LV V A' D F V 'D T P G P V' 12061 TATA1 I iCiG iATTAGTMT6AGGATGATTG 114 P I P R S T T Q V V V R G6 l 12241 MCATCCCMGCTCACTACTCAGGTAGTGGTTC 37 12421

A-

G Y TA V G H K L V D G V K T I'T S A 6G R L FS K R' T A 'T A Y KL Q* N A S R R S R P Q A A S F R N G R R R Q P T S Y N D L L R N F G Q ______ c

R K P P A Q P T Q A I I A E P G D L R H D -L N Q Q CCCGCCCCATMC

E'R

A T L S S N V Q R F F N

CGC

6G H C S L T

'A

D A 66 L

T' Y

T V

'S

R V W VP

97 T K Q I Q R K V A P P A C P. 12601 TC6TAA6ACCTAG TATTCTCTCAT6

FIG. 2-Continued.

is preceded by two UCAAC motifs, but no heterogeneity in the size of RNA 7 has been detected (8). In addition, the presence of UCAAC motifs at positions which do not correspond to an sgRNA confirms the view that it is not only the primary RNA structure at this position which determines the functionality of a UCAAC motif in leader-body joining.

On the basis of a remarkable similarity with sequences involved in the splicing of Tetrahymena pre-rRNA (8), alternative splicing has been proposed as a possible mechanism for the generation of EAV sgRNAs. However, in view of the similarities between the EAV and coronavirus polymerases described in this article (see below), a role for the

2914

DEN BOON ET AL.

J. VIROL.

TABLE 1. Characteristics of EAV RNAs and ORFs RNA

1 2 3 4 5 6 7

Estimated

size (kb) 13 3.2 2.7 2.2 1.9 1.2 0.8

ORF

Nucleotides

of amino No.encoded

Size of

(first-last)

la lb 2 3 4 5 6 7

226-5,406 5,406-9,749 9,825-10,505 10,307-10,795 10,701-11,156 11,147-11,911 11,902-12,387 12,314-12,643

1,727 1,448 227 163 152 255 162 110

186.9 159.0 25.6 18.0 17.2 28.7 17.7 12.3

no.

3' Cn

UUUUAUUU U

(kDa)

EAV leader sequence and the conserved pregenic UCAAC motifs in a coronaviruslike transcription mechanism would not be surprising. The presence of negative-stranded sgRNAs in infected cells, which has recently been described for coronaviruses (35, 36), would be an important indication for a coronaviruslike transcription mechanism in EAV. Experiments to study the negative-stranded RNAs in EAV-infected cells are now in progress. Analysis of ORFs encoded by sgRNAs. EAV ORFs 2, 3, and 4 encode polypeptides with hydrophobic N and C termini and predicted sizes of 25.6, 18.0, and 17.2 kDa, respectively (Table 1). ORF5 encodes a 28.7-kDa product with a hydrophobic N terminus and an internal hydrophobic domain. The predicted products of ORFs 2 through 5 contain 1, 6, 3, and 1 potential N-glycosylation sites, respectively (Fig. 2). In view of these characteristics, each of these ORFs may encode an envelope protein in the range between 28 and 40 kDa. ORF6 encodes a polypeptide of 17.7 kDa which is predicted to be an unglycosylated triple-spanning membrane protein. The deduced ORF7 product is 110 amino acids long (12.3 kDa). This is in good agreement with the size of the product obtained after in vitro translation of the corresponding sgRNA and the identification of this protein as the EAV N protein (8a, 24, 50). Amino acid sequence comparisons between the deduced products of EAV ORFs 2 through 7 and the sequences currently available in the NBRF Protein Identification Resource did not reveal any significant similarities with other viral or cellular proteins. The results of a further study of the EAV structural proteins will be presented elsewhere (8a). EAV polymerase gene resembles the polymerase gene of coronaviruses and toroviruses. By definition, the genomic RNA of a positive-stranded RNA virus encodes the viral polymerase or replicase. ORFs of 5.2 kb (ORFla) and 4.2 kb (ORFlb) were identified in the region of the EAV genome which is not transcribed into sgRNAs. These ORFs overlap over a distance of 19 nt, ORFlb being in the -1 reading frame with respect to ORFla. To exclude possible gelreading errors, the ORFla/ORF1b overlap sequence was determined from three independent cDNA clones. The organization of the EAV POL gene was found to be remarkably similar to that of the POL genes of two other evolutionarily related groups of viruses which produce nested sets of subgenomic mRNAs, the coronaviruses (3, 4) and the toroviruses (40). In these two virus groups, the POL gene also consists of two ORFs, which overlap over a short distance. A subgenomic mRNA from which ORFlb could be expressed is lacking in cells infected with coronaviruses (for a review, see reference 42), toroviruses (38, 39), and arteriviruses (8, 48). Ribosomal frameshifting, resembling that in retroviruses

II

I

C

U

E AV ORF 1 a/ORF 1 b

overlap region

5'

U

u

C

l

G G-

C-G

A-U

G G

AU

U u

C-G

U

A-U C-G C-G C-G

G 54

C-G

nt

G-C AGUUAAACUGAGAGC-GUCG

FIG. 3. Predicted secondary and tertiary RNA (pseudoknot) structure of the EAV ORF1a/ORF1b overlap region. The slippery sequence G UUA AAC is indicated by a dashed line. The ORFla termination codon is underlined. The SmaI site in the downstream pseudoknot domain, which was digested in the construction of pFSEAV8, is indicated by a dotted line.

(26), has been found to be the ORFlb expression mechanism for the coronavirus infectious bronchitis virus (IBV [5]). A "slippery" heptanucleotide sequence (5'-UUUAAAC-3') and a downstream RNA pseudoknot (32) were shown to be essential for efficient frameshifting in the ORF1a/ORF1b overlap region (6). Sequence analysis of the POL genes of the mouse hepatitis coronavirus (MHV-A59 [4]) and the Berne torovirus (BEV [40]) revealed that this expression mechanism and the essential sequence elements in the ORF1a/ORFb overlap region have been conserved during evolution of these virus groups. The frameshift-directing RNA pseudoknot, consisting of a stem-loop structure base-paired to a downstream sequence element, could also be folded at the 3' side of the EAV ORF1a/ORF1b overlap region (Fig. 3). A potential slippery sequence (5'-GUUAAAC-3') was identified just upstream of the stem of the EAV hairpin and was positioned at the 5' side of the termination codon for translation of ORFla. These findings implied that ribosomal frameshifting was also the probable mechanism used to express EAV ORFlb. In addition, they were the first indications of an ancestral relationship between coronavirus, torovirus, and arterivirus polymerases. EAV ORFla/ORFlb overlap region directs ribosomal frameshifting. To study the frameshift-directing potential of the EAV ORF1a/ORF1b overlap region, we used the expression of a reporter gene similar to that used in the study of translational frameshifting in MHV-A59 (4) and BEV (40). The EAV ORF1a/ORF1b overlap region from clone 579 was cloned into a copy of the MHV-A59 26-kDa membrane protein (M) gene positioned downstream of a T7 promoter. The resulting construct, pFSEAV, consisted of the 5' portion of the MHV M ORF fused in-frame to the EAV ORFla/ORF1b overlap sequence, which in turn was fused in frame to the 3' portion of the MHV M ORF (Fig. 4A). Termination of translation at the ORFla termination codon should produce a 24-kDa fusion protein (Fig. 4A). Ribosomal frameshifting would lead to the synthesis of an additional fusion protein of 41 kDa. The two predicted products could be identified by immunoprecipitation with antibodies di-

VOL. 65, 1991

A

NUCLEOTIDE SEQUENCE OF EAV GENOME

pFSEAV -ij.-____.

1

2915

B

_m

_

FIG. 4. Analysis of EAV ribosomal frameshifting in vivo with constructs pFSEAV and pFSEAV8. (A) Schematic representation of the pFSEAV construct. The solid box represents the T7 RNA polymerase promoter in pBS. Open boxes indicate EAV sequences; N- and C-terminal (term.) MHV M sequences are represented by differently hatched boxes. The composition of the two predicted translation products is shown. The SmaI, PvuII, and HindlIl restriction sites which were used in the generation of pFSEAV8 are indicated. (B) SDS electrophoresis of pFSEAV (center) and pFSEAVf (right) translation products immunoprecipitated by the N-terminal (aN) and C-terminal (aC) MHV M antisera. The two lanes on the left show the result of the same immunoprecipitations with an untransfected control lysate from vaccinia virus vTF7-3-infected cells. Sizes are shown in kilodaltons.

rected against the N- and C-terminal parts of the MHV M protein (Fig. 4A). Ribosomal frameshifting was studied in vivo by transfection of pFSEAV into HeLa cells which had previously been infected with vaccinia virus recombinant vTF7-3. Expression of pFSEAV led to the synthesis of proteins of 24 and 41 kDa. Both products were precipitated by the antiserum directed against the N terminus of the MHV M protein (Fig. 4B). Only the 41-kDa product was precipitated when the C-terminal antiserum was used (Fig. 4B). The amount of incorporated label in both protein bands was determined by scintillation counting, resulting in an estimated frameshifting efficiency of between 15 and 20% (the 24- and 41-kDa products contained 8 and 10 methionine residues, respectively). The EAV slippery sequence contains a nucleotide substitution compared with the corresponding sequence of coronaviruses and toroviruses (GUUAAAC instead of UUUA AAC). EAV therefore breaks the rule that slippery sequences consist of runs of three A, U, or G residues followed by the tetranucleotide UUUA, UUUU, or AAAC (26, 46). Although this may seem somewhat conflicting with the simultaneous slippage model proposed by Jacks et al. (26), there are indications that the mRNA-tRNA base-pairing requirements at the peptidyl (P) site of a frameshifting ribosome are more relaxed than those at the aminoacyl (A) site (6, 26). In fact, this nucleotide substitution may explain the lower frameshifting efficiency (in a reporter gene) of EAV than of coronaviruses and toroviruses. The presence of a unique SmaI restriction site within the putative downstream sequence element of the EAV RNA pseudoknot allowed the deletion of this domain from pFSEAV (Fig. 4). Construct pFSEAV5 contained a 276-bp in-frame deletion (between the SmaI and PvuII sites depicted in Fig. 4A) in the EAV ORFlb sequence and should produce a frameshift product of about 31 kDa which should also contain the C-terminal MHV M protein sequence. However, expression of pFSEAVb yielded only the 24-kDa product which resulted from "normal" termination of

ORFla translation (Fig. 4B). Even after prolonged exposure of the autoradiographs, a 31-kDa polypeptide could not be detected. This result proved that a domain more than 71 nt downstream of the stem-loop structure is essential for efficient frameshifting. In view of the data on retrovirus, coronavirus, and torovirus frameshifting, this domain is likely to be the sequence 5'-CCGGGA-3' at 70 to 75 nt downstream of the hairpin; the last four nucleotides of this sequence were deleted by the digestion of the SmaI site during the construction of pFSEAV5 (Fig. 3). The second loop (L2, according to the terminology of Pleij et al. [32]) of the pseudoknot, which connects the hairpin with the downstream sequence element, appears to be considerably larger in EAV than in other frameshift-employing viruses. On the basis of the results obtained with pFSEAVB, the length of L2 in the EAV pseudoknot is assumed to be 69 nt, whereas loops 4 to 32 nt in length have been predicted for other viral frameshifting systems (6, 46). It is unclear whether the length of L2 may influence the stability of the RNA pseudoknot or the frameshifting effi-

ciency. The ORF1a/ORF1b ribosomal frameshifting mechanism and the RNA structures involved are remarkably conserved in coronaviruses (4-6), toroviruses (40), and arteriviruses (Fig. 3). Its occurrence in EAV identifies translational frameshifting as an ancient and probably essential regulating step in POL gene expression. EAV ORFlb product has sequence similarity with the polymerase protein of coronaviruses and toroviruses. The conclusion that coronaviruses and toroviruses are evolutionarily related was based on the similar organization and expression of their POL genes and on the presence of four conserved domains in their predicted ORFlb products (40). Although the EAV ORFlb (4.2 kb) is considerably smaller than the corresponding ORFs of coronaviruses (8.0 kb) and toroviruses (6.9 kb), we have identified four domains in the sequence of its product which we postulate are homologous with the conserved motifs from coronavirus and torovirus ORFlb products (Fig. SA).

2916

J. VIROL.

DEN BOON ET AL.

AMHV

12 3

ORF I a

4

III1

IBV BEV

ORFIb

123

ORF 1a

4

KI1EA1|||ORF1b 4 R123 |ORFIOa t ~~ORF1Ib

EAV

B 1 MHiV IBV BEV EAV

2MHV IBV BEV EAV

572 578 509 371

gvpvviGttKfygg-16-1ulwypkCDRauPni R-39-G-cyyvKpGGtSSGDatTafaNsvfn-58-si lSDgvvc-32-KcwvetdieKgp-hefcSqht-mlvkwOdevyLpypdpsRi lgagc 830 nasvviGttKfygg-16- lmGwypkCDRanPnl lR-39-G-giyvKpGGtSSGDatTayaNsvfn-58-simi lSDDgvvc-32-KcwvepdleKgp-hefcSqht-mlvevDGdpkyLpypdpsRi lgacv 836 ggfcl iGvsKy--g-18-vfGsDytkCDRtfPlsfR-35-G-ml lnKpGGtSSGDatTahsNtfyn-51-f lnf lSMDsf if-29-Kswst----KghieefcSah---i iktDGeyhfL--psrgRllasl 1 746 579 gspiylGksKfdpi- 6-cl5tD9esCDRstPalvR-31-GsvaftKrGGISSGDDiTsisNt7iys-38rviy90vvit-23-Kvdpk----KtvntssDSflgcrfkqvDGkcyL-aslqdvrslI * * *** ** *Z-* * 7**** * -*;**

919 lqsvgaCvvCssqtslrcgsCirkpllCckCaydHvmstdHkyvlsvspyvCns 972 925 lqscgvCvvCnsqtilrcgnCirkpflCckCcydHvmhtdHknvlsinpyiCsq 978 843 anfdkvCfcCpnpavsvceeCyvplplCayCyyvHvvisnHskvedkfkcfCgq 896 641 qyesavCtvwaapvaksa-Cggwf--qgnC vpyHaghc-Httsifan---Cgh 687 *

*

*

*

*

*

*

*

3 NHV 1218 VqGPPGtGKshlaig--lavyyctA-56-TinaLPelvtdi i--vvDEVsml-16-yvyiGDpaQLp-30-fLgtCyRcpkeivdtvs-72-LG1qtqTvDSaQG-19-vnRfnVaitRakkgilc 1512

IBV 1210 VqGPPGsGKshfaig--lavyfssA-56-TinaLPevscdi 1-- 1vDEVsml-16-vvyvGDpaQLp-29-fLakCyRcpkeivdtvs-75-LGlnvqTvDSsQG-19-inRfnValtRakrgi lv 1506 BEV 1099 VmGPPGtGKt-tfvyd---tylskA-53-ThntLPf iksavl-- iaDEVs 1i-15-wi lGDpfQLs-25-yLtaCyRcppqi lsafs-64-LGdvt-TiDSsQG-19-vriviVgcsRsthlvvv 1374 -37-LGhr--TiDSiQG17-rp avVavtRsqe_yi 1031 EAV 799 1eGPPGsGKtfh0vkdvlavvg-sA144-Tv-rPqgtsegetfvDVayf-13-vkgy2lrQL-20-p ** **

4 MNV IBV BEV EAV

* ***

*

*

*

****

*

**

*

*

*

*

**

******

2322 HvvyGsfnq--ki iGG6Hi 1-igl-38-tviDl 1 lddfvdivk- 7-SkVvnvnvDfkdfqfMlwcn-eekvuTfYp-163-lalGgsva--ikiTEfs-17-fctn-vnaSSsEgFl i 2641 2274 Hi lyGevdk--pqlGG1Htvigm-38-tv1D1 1 lddf lel lr-10-SkVvtvsiDyhsinfMtwfe-dgsikTcYp-165-lalGgsfa--vkvTEts-17-fcta-vnaSSsEaFl 1i 2598 1921 HvfsGdftevgtdiGGvHhvval-29-tlvDvcanqlyekvk- 8-SkVifvniDfqdvqfMvfangeddiqTfYp-136-lalGgtiv--fktTEss-17-fftagvntSSsEvFw 2209 1236 HaclGeink--stvGGsHfifsq-27-sw yapsfepyjh- 4ky m_kpcr1fwrn----a-TfYv- 45-.sJyaaewas pp 8-ryivkr11SStEvF1c 1413

FIG. 5. Positions (A) and aligned sequences (B) of the conserved domains in the ORFlb product of the POL genes of the coronaviruses

MHV-A59 (4) and IBV-M42 (3), the torovirus BEV (40), and the arterivirus EAV. Amino acids which are identical in all four sequences are

shown in capital letters. Amino acids which are identical or conserved in EAV are underlined. Domain 1, amino acids described as conserved in the majority of positive-stranded RNA viruses (33) are indicated by asterisks. Domain 2, possible conserved cysteine and histidine residues are marked by asterisks. Domain 3, amino acids conserved in "Sindbis virus-like" RNA plant viruses (group A2 [20]) are indicated by asterisks.

Domain 1 represents the well-known polymerase or GDD motif which has been identified in the polymerases of all positive-stranded RNA viruses (for a review, see reference 33). The best match for EAV is in alignment with coronaviruses and toroviruses. EAV also contains a serine instead of a glycine residue in the GDD core of the motif (SDD). Motif 2 is rich in cysteine and histidine residues and was originally identified by Gorbalenya et al. (18) as a possible zinc-binding finger in the IBV polymerase. Although the spacing between the cysteine and histidine residues is not absolutely conserved, the EAV ORFlb product may also contain a part of this motif. Domain 3 has been postulated to represent a nucleoside triphosphate (NTP)-binding helicase which is commonly found in proteins of both viral and cellular origin (19, 22). A

role for NTP-binding proteins in RNA duplex unwinding has been suggested. Recently, the plum pox potyvirus CI protein, which contains the putative helicase motif, was shown to exhibit an NTP-dependent RNA duplex-unwinding activity in vitro (28). The EAV helicase motif, like that of coronaviruses and toroviruses (40), is more closely related to the alphaviruslike helicase motifs than to the corresponding domains of picornaviruslike origin (20). To date, coronaviruses, toroviruses, and arteriviruses are the only positivestranded RNA viruses with an unsegmented genome in which the order of the polymerase and helicase domains is reversed (i.e., the helicase domain is positioned downstream of the polymerase motif). The fourth conserved domain is located near the C terminus of the ORFlb product (Fig. 5) and has been described

VOL. 65, 1991

NUCLEOTIDE SEQUENCE OF EAV GENOME TRP CHT ELA

sqwvvsaaHcyks - 37 - nnDimliktks enwvvtaaHcgvt - 38 - nnDittLkLst qnwvmtaaHcvdr - 41 - gyDiaLLrlaq

SIN YFV BVDV

egkvmkplHvkgt - 15 - ayDmefaqtpv - 37 - gvggrGdSGrpimdnsgrvvaivLGgadegtrt ggvfhtmwHvtrg - 17 - keDLvayggsw - 48 - ldypsGtSGspivnrnge----viGLygngiLv qggissvdHvtag - 21 - LtDeteygvkt - 52 - LknLkGwSGlpifeassgr---vvGrvkvgkne

EAV

evvvltasHvvgr - 19 - ngDfaeavttq - 40 - awttsGdSGsavvqgda-vv-Gvht-gsnt

*

*

*

frDirqhiptq frDirgfised vrDitkhfrdt frDntskfvka frDitqhfikk cwDlfcwdpdk

HAV CPMV

dnvaiLptHaspg - 38 drvcviptHaqpg - 38 gtaylvprHLfae - 45 grtLvvnrHmaes - 41 ddwlLvpsHaykf - 47 LackhfftHiktk - 43

SBMV

mdvLmvphHvwyn - 28 - riDfvlvkvpt - 52 - aptakGwSGtpLytrd-------givGmHt-Gyvd

EAV

evvvLtasHvvgr - 19 - ngDfaeavttq - 40 - awttsGdSGsavvqgd-------------avvGvHt-Gsntsgvay

MHV IBV

ddkvycprHvics gdtiycprHvLgk

PAP MCP CATB CATH CATL CDP

pvknQgscGsCWafsa pvknQgqcGsCWafsa qirdQgscGsCWafga pvknQgacGsCWtfst pvkdQgacGsCWafnt dic-QgaLGdCWULaa

EAV

vttdQeqdGfCWLkLL

MHV1 MHV2

fyspaiertnCWLrst yfafkqsnmCyinva fLiLewrdGnCWissa

PV1

HRV1 4 FMDV EMCV

-

- 46 - 45 - 49 - 49 - 58 - 56 -

*

*

*

*

*

- gsfLcGsCGsvgyvitgdsvrfvymhqteLstGcHt-Gtdfsgnfy - asflaGaCGsvgfniekgvvnffymhhlelpnalHt-GtdLmgefy

97 101

-

-

-

127 128 164 130 128 146

-

kvdHavaavgyn nLdHgvLtvgyg mggHairiLgwg kvnHavLavgyg dLdHgvlvvgyg vkgHaysvtapk

-

201

-

vLdHiteaatfg

-

*

fptraGqCGg-vitctg------------kviGmH-vGgng yatktGqCGg-vLcatg------------kifGiH-vGgng aatkaGyCGgavLakdgadt---------fivGtHsaGgng antrkGwCGsalladLggsk---------kiLGiHsaGsmg gegLpGmCGgaLvssnqsiqn--------aiLGiHvaGgns aptipedCGs-Lviahiggkh--------kivGvHvaGiqg *

*

*

c

77 - kdscqGdSGgpvvcsg-----ktqGivswgsgc 77 - vsscmGdSGgpLvckkng-awtLvGivswgsst 78 - rsgcqGdSGgpLhcLvng-qyavhGvtsfvsrL

*

*

C

-

*

*

B

-

2917

*

*

*

142 - ndcHsmavvdkg - 148 - svgH-ythvkck - 154 - nsgHcytqaagq IBV FIG. 6. Comparative analysis of possible EAV protease domains. (A) Alignment of the putative EAV trypsinlike serine protease motif -

with cellular and alphavirus, flavivirus, and pestivirus serine proteases. Identical residues are shown in capital letters; putative catalytic residues are indicated with asterisks. The alignment was based on sequence data and comparisons presented in references 2 and 17. (B) Alignment of the putative EAV serine protease and coronavirus 3C-like cysteine protease motifs with the 3C protease sequences of selected picornaviruses and picornaviruslike viruses. The figure was based on sequence data and comparisons presented in references 1, 16, and 29. The alignment of the coronavirus 3C-like sequences with 3C proteases (Fig. 10 in reference 29) was modified to optimize the alignment with the EAV sequence. (C) Alignment of the putative EAV papainlike cysteine protease with selected cellular cysteine proteases and postulated coronavirus papainlike proteases. The figure was based on the alignment presented in Fig. 12 of reference 29. Abbreviations: BVDV, bovine viral diarrhea virus; CATB, rat cathepsin B; CATH, rat cathepsin H; CATL, rat cathepsin L; CDP, chicken calcium-dependent protease; CHT, chymotrypsin; CPMV, cowpea mosaic virus; ELA, elastase; EMCV, encephalomyocarditis virus; FMDV, foot-and-mouth disease virus; HAV, hepatitis A virus; HRV14, human rhinovirus type 14; IBV, infectious bronchitis virus strain M42; MCP, mouse cysteine protease; MHV, mouse hepatitis virus strain JHM; PAP, papain; PV1, poliovirus type 1; SBMV, southern bean mosaic virus; SIN, Sindbis virus; TRP, trypsin; YFV, yellow fever virus.

only for coronaviruses and toroviruses (40). The central part of this domain is missing in EAV. This could imply that motif 4 consists of multiple functional domains, not all of which have been conserved in the EAV polymerase. Analysis of the EAV ORFla sequence. Like the ORFlb product, the EAV ORFla protein (1,727 amino acids) is considerably smaller than the corresponding coronavirus polypeptide (3,951 amino acids for IBV [3], 4,488 amino acids for MHV [29]). Comparison of the EAV and coronavirus ORFla proteins did not reveal any obvious homologous domains. This is not surprising, since the ORFla product has been reported to be much more diverged than the ORFlb protein: when the ORFla-encoded amino acid sequences of the coronaviruses IBV and MHV were compared, only moderate sequence similarities were observed (29).

The EAV ORFla product contains a number of cysteinerich motifs (amino acids 25 to 54, 243 to 356, and 632 to 657 in Fig. 2) and some very hydrophobic regions (e.g., amino acids 525 to 575 and 890 to 970 in Fig. 2). An interesting feature is the presence of a serine protease consensus sequence between amino acids 1080 and 1220 (Fig. 6A). The alignment suggests that the histidine, aspartic acid, and serine residues at amino acid positions 1103, 1129, and 1184, respectively, form the catalytic triad of a trypsinlike serine protease. In view of the large size of the predicted EAV RNA1 translation products (187 and 346 kDa), the presence of a viral protease which can process these primary translation products into smaller active units would not be unexpected. The relative position of the putative EAV trypsinlike protease domain in ORFla corresponds to a putative prote-

2918

DEN BOON ET AL.

ase motif in the ORFla products of IBV and MHV (18, 29). However, this coronavirus domain has been proposed to be related to the picornavirus 3C-like cysteine proteases (18, 29). On the basis of sequence comparison and secondary structure predictions, the trypsinlike serine proteases and the 3C-like cysteine proteases are assumed to belong to the same protease superfamily (1, 16). Therefore, the putative EAV and coronavirus proteases can probably be considered related by common ancestry. Whether a more direct relationship exists, i.e., a relatively recent substitution of an active-site cysteine for serine in EAV or serine for cysteine

in coronaviruses, is unclear from the alignment in Fig. 6B. Such a substitution has been postulated for the putative serine protease of southern bean mosaic virus, which appears to be more closely related to 3C-like cysteine proteases than to conventional serine proteases (16) (Fig. 6B). In addition to sequence similarities around the active-site residues, there is some similarity between EAV and coronaviruses in the C-terminal region of the domain, which is assumed to be involved in substrate binding (1, 16) (Fig. 6B). On the other hand, the spacing between the motifs which form the putative EAV protease domain is similar to the spacing in alphavirus, flavivirus, and pestivirus serine proteases (Fig. 6A), and EAV appears to contain the catalytic aspartic acid residue which is lacking in coronaviruses (29). In addition to the 3C-like protease domain discussed above, possible papainlike cysteine protease motifs have been identified in the N-terminal one-third of the MHV and IBV ORFla products (29). The sequence of the EAV ORFla protein between amino acids 158 and 178 displays some sequence similarity with the region around the active-site cysteine-tryptophan dipeptide of cellular and alphavirus papainlike proteases (21, 27). The EAV sequence contains several histidine residues which could fulfill the role of active-site histidine in a papainlike protease. The most convincing alignment is obtained when the histidine at position 374 is used (Fig. 6B), although a 201-residue distance between the cysteine- and histidine-containing regions appears to be rather long for a papainlike protease domain. Experimental data will have to confirm the functionality and nature of both the arterivirus and the coronavirus putative proteases. Coronaviruslike superfamily. On the basis of computerassisted analysis of protein sequences and in vitro transcription and translation experiments (8a), a genome organization with the general order 5'-polymerase gene-envelope protein genes-N protein gene-3' emerges for EAV. This organization is atypical of alphaviruses and rubiviruses, in which only two ORFs are present in the genome and in which the N protein gene is always present upstream of the envelope proteinencoding region (44). However, the organization of the EAV genome, its expression through the production of a 3'coterminal nested set of mRNAs, and the translation of the second POL ORF by ribosomal frameshifting are remarkably similar to the characteristics of coronavirus (6, 42) and torovirus (39, 40) genome organization and expression. During the last decade, various replicase modules have been recognized among positive-stranded RNA viruses. They form the basis for superfamilies of plant and animal viruses, each displaying characteristic features. The two largest superfamilies are those of the picornaviruslike and the alphaviruslike virus groups (14, 44). More recently, we proposed a coronaviruslike superfamily (40), which would comprise the coronaviruses and toroviruses and, on the basis of the data presented here, the arteriviruses. The importance of the four conserved domains in coronavirus

J. VIROL.

and torovirus polymerases (40) is underscored by their presence in the putative polymerase of the only distantly related EAV. This is not surprising for the polymerase and helicase activities (domains 1 and 3, respectively), as both are common to the replicases of positive-stranded RNA viruses. However, the conservation, both in sequence and in relative position, of domains 2 and 4 strongly suggests that they also play an important role in the replication of coronaviruslike viruses. This hypothesis is supported by the fact that domain 4 has recently been identified in the putative polymerase of lactate dehydrogenase-elevating virus (LDV) (lla). Although LDV has been referred to as a possible togavirus (53), its genome consists of multiple ORFs (12) and the LDV N protein gene is located at the 3' end of the genome (13). It is tempting to speculate that this domain performs a function which is specific for coronaviruslike viruses, e.g., the synthesis of multiple subgenomic mRNAs. Since the EAV genome is only 12.7 kb in size, an infectious EAV cDNA clone can be constructed more easily than a similar clone of a coronavirus or torovirus. This will enable us to test the biological functions of domain 4 (and also of domain 2) in the near future. Our observations may have consequences for virus classification in general. On the basis of the similarities in polymerase expression and amino acid sequence, we postulate that the polymerase genes of arteriviruses, coronaviruses, and toroviruses have descended from a common ancestor. Logically, this evolutionary relationship should be acknowledged by taxonomic position. However, nucleocapsid architecture-a classic trait for viral taxonomy, with the same ranking as the type of nucleic acid or the presence of an envelope-is icosahedral in EAV (23), helical in coronaviruses (for a review, see reference 37), and tubular in toroviruses (52). An additional difference at the structural level is the fact that the EAV envelope does not bear the elongated spikes which are present in both coronaviruses (for a review, see reference 45) and toroviruses (24, 41). The coupling of different arrays of structural genes to the same replicase has been explained by recombination of complete genes or sets of genes (modules). Together with divergence from a common ancestor, modular evolution can account for the diverse composition of viral genomes (14, 44, 55). The joining of a coronaviruslike replicase module to a set of structural genes which confer togavirus morphology to EAV might be another example of modular evolution. Viral taxonomy is faced with the dilemma of designing a system which is both intellectually satisfying and practical. ACKNOWLEDGMENTS We thank Peter Bredenbeek, Raoul de Groot, and Rend Rijnbrand for helpful discussions and assistance. E.C. was the recipient of a postdoctoral EMBO fellowship (ALTF 131-1988). Part of this work was supported by a research grant from Duphar BV, Weesp, The Netherlands. REFERENCES 1. Bazan, J. F., and R. J. Fletterick. 1988. Viral cysteine proteases are homologous to the trypsin-like family of serine proteases: structural and functional implications. Proc. Natl. Acad. Sci. USA 85:7872-7876. 2. Bazan, J. F., and R. J. Fletterick. 1989. Detection of a trypsinlike serine protease domain in flaviviruses and pestiviruses. Virology 171:637-639. 3. Boursnell, M. E. G., T. D. K. Brown, I. J. Foulds, P. F. Green, F. M. Tomley, and M. M. Binns. 1987. Completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus. J. Gen. Virol. 68:57-77.

VOL. 65, 1991

4. Bredenbeek, P. J., C. J. Pachuk, J. F. H. Noten, J. Charite, W. Luytjes, S. R. Weiss, and W. J. M. Spaan. 1990. The primary structure and expression of the second open reading frame of the polymerase gene of the coronavirus MHV-A59. Nucleic Acids Res. 18:1825-1832. 5. Brierley, I., M. E. G. Boursnell, M. M. Binns, B. Bilimoria, V. C. Blok, T. D. K. Brown, and S. C. Inglis. 1987. An efficient ribosomal frameshifting signal in the polymerase-encoding region of the coronavirus IBV. EMBO J. 6:3779-3785. 6. Brierley, I., P. Diggard, and S. C. Inglis. 1989. Characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an RNA pseudoknot. Cell 57:537-547. 7. Devereux, J., P. Haeberli, and 0. Smithies. 1984. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12:387-395. 8. de Vries, A. A. F., E. D. Chirnside, P. J. Bredenbeek, L. A. Gravestein, M. C. Horzinek, and W. J. M. Spaan. 1990. All subgenomic mRNAs of equine arteritis virus contain a common leader sequence. Nucleic Acids Res. 18:3241-3247. 8a.de Vries, A. A. F., et al. Unpublished data. 9. Doll, E. R., J. T. Bryans, W. H. M. McCollum, and M. E. Wallace. 1957. Isolation of a filterable agent causing arteritis of horses and abortion of mares. Its differentiation from the equine (abortion) influenza virus. Cornell Vet. 47:3-41. 10. Doll, E. R., R. E. Knappenberger, and J. T. Bryans. 1957. An outbreak of abortion caused by the equine arteritis virus. Cornell Vet. 47:69-75. 11. Fuerst, T. R., E. G. Niles, F. W. Studier, and B. Moss. 1986. Eukaryotic transient-expression system based on recombinant vaccinia virus that synthesizes bacteriophage T7 RNA polymerase. Proc. Natl. Acad. Sci. USA 83:8122-8126. 11a.Godeny, E. K., and M. A. Brinton. Personal communication. 12. Godeny, E. K., D. W. Speicher, and M. A. Brinton. 1990. Sequence analysis of the genome RNA of lactate dehydrogenase-elevating virus, p. 36-40. In M. A. Brinton and F. X. Heinz (ed.), New aspects of positive-strand RNA viruses. American

Society for Microbiology, Washington, D.C. 13. Godeny, E. K., D. W. Speicher, and M. A. Brinton. 1990. Map location of lactate dehydrogenase-elevating virus (LDV) capsid protein (Vpl). Virology 177:768-771. 14. Goldbach, R., and J. Wellink. 1988. Evolution of plus-strand RNA viruses. Intervirology 29:260-267. 15. Golnik, W., A. Morailion, and J. Golnik. 1986. Identification and antigenic comparison of equine arteritis virus isolated from an outbreak of epidemic abortion of mares. J. Vet. Med. B 33:413417. 16. Gorbalenya, A. E., A. P. Donchenko, V. M. Blinov, and E. V. Koonin. 1989. Cysteine proteases of positive strand RNA viruses and chymotrypsin-like serine proteases: a distinct protein superfamily with a common structural fold. FEBS Lett. 243: 103-114. 17. Gorbalenya, A. E., A. P. Donchenko, E. V. Koonin, and V. M. Blinov. 1989. N-terminal domains of putative helicases of flaviand pestiviruses may be serine proteases. Nucleic Acids Res. 17:3889-3897. 18. Gorbalenya, A. E., E. V. Koonin, A. P. Donchenko, and V. M. Blinov. 1989. Coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis. Nucleic Acids Res. 17:48474861. 19. Gorbalenya, A. E., and E. V. Koonin. 1989. Viral proteins containing the purine NTP-binding sequence pattern. Nucleic Acids Res. 17:8413-8440. 20. Habili, N., and H. Symons. 1989. Evolutionary relationship between luteoviruses and other RNA plant viruses based on sequence motifs in their putative RNA polymerases and nucleic acid helicase. Nucleic Acids Res. 17:9543-9555. 21. Hardy, W. R., and J. H. Strauss. 1989. Processing the nonstructural proteins of Sindbis virus: nonstructural proteinase is in the C-terminal half of nsP2 and functions both in cis and in trans. J. Virol. 63:4653-4664. 22. Hodgman, T. C. 1988. A new superfamily of replicative proteins. Nature (London) 333:22-23.

NUCLEOTIDE SEQUENCE OF EAV GENOME

2919

23. Horzinek, M. C., J. Maess, and R. Laufs. 1971. Studies on the substructure of togaviruses. II. Analysis of equine arteritis, rubella, bovine viral diarrhea and hog cholera viruses. Arch. Gesamte Virusforsch. 33:306-318. 24. Horzinek, M. C., J. Ederveen, B. Kaeffer, D. de Boer, and M. Weiss. 1986. The peplomers of Berne virus. J. Gen. Virol. 67:2475-2483. 25. Hyllseth, B. 1973. Structural proteins of equine arteritis virus. Arch. Gesamte Virusforsch. 40:177-188. 26. Jacks, T., D. H. Madhani, F. R. Masiarz, and H. E. Varmus. 1988. Signals for ribosomal frameshifting in the Rous sarcoma virus gag-pol region. Cell 55:449-458. 27. Kamphuis, I. G., J. Drenth, and E. N. Baker. 1985. Thiol proteases: comparative studies based on the high-resolution structures of papain and actinidin, and on amino acid sequence information for cathepsins B and H, and stem bromelain. J. Mol. Biol. 182:317-329. 28. Lain, S., J. L. Riechmann, and J. A. Garcia. 1990. RNA helicase: a novel activity associated with a protein encoded by a positive strand RNA virus. Nucleic Acids Res. 18:7003-7006. 29. Lee, H. J., C. K. Shieh, A. E. Gorbalenya, E. V. Koonin, N. la Monica, J. Tuler, A. Bagdzhadzhyan, and M. M. C. Lai. 1991. The complete sequence (22 kilobases) of murine coronavirus gene 1 encoding the putative protease and RNA polymerase. Virology 180:567-582. 30. Murphy, F. A. 1980. Togavirus morphology and morphogenesis, p. 241-316. In R. W. Schlesinger (ed.), The togaviruses: biology, structure and replication. Academic Press, Inc., New York. 31. Pearson, W. R., and D. J. Lipman. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85:2444 2448. 32. Plei, C. W. A., K. Rietveld, and L. Bosch. 1985. A new principle of RNA folding based on pseudoknotting. Nucleic Acids Res. 13:1717-1731. 33. Poch, O., I. Sauvaget, M. Delarue, and N. Tordo. 1989. Identification of four conserved motifs among the RNA dependent polymerase encoding elements. EMBO J. 8:3867-3874. 34. Porterfield, J. S., J. Casals, M. P. Chumakov, S. Y. Gaidamovich, C. Hannoun, I. H. Holmes, M. C. Horzinek, M. Mussgay, N. Oker-Blom, P. K. Russel, and D. W. Trent. 1978. Togaviridae. Intervirology 9:129-148. 35. Sawicki, S. G., and D. L. Sawicki. 1990. Coronavirus transcription: subgenomic mouse hepatitis virus replicative intermediates function in RNA synthesis. J. Virol. 64:1050-1056. 36. Sethna, P. B., S. L. Hung, and D. A. Brian. 1989. Coronavirus subgenomic minus-strand RNAs and the potential for mRNA replicons. Proc. Natl. Acad. Sci. USA 86:5626-5630. 37. Siddeli, S. G., H. Wege, and V. ter Meulen. 1983. The structure and replication of coronaviruses. Curr. Top. Microbiol. Immunol. 99:131-163. 38. Snider, E. J., J. Ederveen, W. J. M. Spaan, M. Weiss, and M. C. Horzinek. 1988. Characterization of Berne virus genomic and messenger RNAs. J. Gen. Virol. 69:2135-2144. 39. Sni,der, E. J., M. C. Horzinek, and W. J. M. Spaan. 1990. A 3'-coterminal nested set of independently transcribed mRNAs is generated during Berne virus replication. J. Virol. 64:331-338. 40. Snijder, E. J., J. A. den Boon, P. J. Bredenbeek, M. C. Horzinek, R. Rijnbrand, and W. J. M. Spaan. 1990. The carboxyl-terminal part of the putative Berne virus polymerase is expressed by ribosomal frameshifting and contains sequence motifs which indicate that toro- and coronaviruses are evolutionary related. Nucleic Acids Res. 18:4535-4542. 41. SnUder, E. J., J. A. den Boon, W. J. M. Spaan, M. Weiss, and M. C. Horzinek. 1990. Primary structure and posttranslational processing of the Berne virus peplomer protein. Virology 178: 355-363. 42. Spaan, W. J. M., D. Cavanagh, and M. C. Horzinek. 1988. Coronaviruses: structure and genome expression. J. Gen. Virol. 69:2939-2952. 43. Staden, R. 1986. The current status and portability of our sequence handling software. Nucleic Acids Res. 14:217-233. 44. Strauss, J. H., and E. G. Strauss. 1988. Evolution of RNA

2920

DEN BOON ET AL.

viruses. Annu. Rev. Microbiol. 42:657-683. 45. Sturman, L. S., and K. V. Holmes. 1985. The novel glycoproteins of coronaviruses. Trends Biochem. Sci. 10:17-20. 46. ten Dam, E. B., C. W. A. Pleij, and L. Bosch. 1990. RNA pseudoknots; translational frameshifting and readthrough on viral RNAs. Virus Genes 4:121-136. 47. Timoney, P. J., W. H. McCollum, A. W. Roberts, and T. W. Murphy. 1986. Demonstration of the carrier state in naturally acquired equine arteritis virus infection in the stallion. Res. Vet. Sci. 41:279-280. 48. van Berlo, M. F., M. C. Horzinek, and B. A. M. van der Zeijst. 1982. Equine arteritis virus-infected cells contain six polyadenylated virus-specific RNAs. Virology 118:345-352. 49. van Berlo, M. F., P. J. M. Rottier, M. C. Horzinek, and B. A. M. van der Zeijst. 1986. Intracellular equine arteritis virus (EAV)specific RNAs contain common sequences. Virology 152:492496. 50. van Berlo, M. F., P. J. M. Rottier, W. J. M. Spaan, and M. C.

J. VIROL.

51. 52.

53.

54.

55.

Horzinek. 1986. Equine arteritis virus (EAV)-induced polypeptide synthesis. J. Gen. Virol. 67:1543-1549. van der Zeijst, B. A. M., M. C. Horzinek, and V. Moennig. 1975. The genome of equine arteritis virus. Virology 68:418-425. Weiss, M., F. Steck, and M. C. Horzinek. 1983. Purification and partial characterization of a new enveloped RNA virus (Beme virus). J. Gen. Virol. 64:1849-1858. Westaway, E. G., M. A. Brinton, S. Y. Gaidamovich, M. C. Horzinek, A. Igarashi, L. Kaariainen, D. K. Lvov, J. S. Porterfield, P. K. Russel, and D. W. Trent. 1985. Togaviridae. Intervirology 24:125-139. Zeegers, J. J. W., B. A. M. van der Zeist, and M. C. Horzinek. 1976. The structural proteins of equine arteritis virus. Virology 73:200-205. Zimmern, D. 1987. Evolution of RNA viruses, p. 211-240. In J. J. Holland, E. Domingo, and P. Ahlquist (ed.), RNA genetics, vol. 2. CRC Press, Boca Raton, Fla.

Equine arteritis virus is not a togavirus but belongs to the coronaviruslike superfamily.

The nucleotide sequence of the genome of equine arteritis virus (EAV) was determined from a set of overlapping cDNA clones and was found to contain ei...
2MB Sizes 0 Downloads 0 Views