VIROLOGY

188,296-310

(1992)

Analysis of Nucleotide Sequence of the Rightmost 43 kbp of Herpesvirus Saimiri (HVS) LDNA: General Conservation of Genetic Organization between HVS and Epstein-Barr Virus JOHN NICHOLAS,’

KEITH R. CAMERON,

HEATHER COLEMAN,

Division of Virology, National Institute for Medical Research, Received December

CAROL NEWMAN, AND ROBERT W. HONESS

The Ridgeway,

2, 199 1; accepted

Mill Hill, London NW7 IAA, United Kingdom

February

3, 199 1

We present an analysis of 43,658 bp of contiguous nucleotide sequence comprising the right terminal region (conventional orientation) of the unique protein-coding component (L-DNA) of the herpesvirus saimiri (HVS) genome. Within this region lie the genes encoding the 160-kDa virion protein, which is homologous to the 140-kDa membrane antigen of Epstein-Barr virus (EBV), thymidylate synthase (TS), and the immediate-early (IE) 52-kDa protein which is homologous to the EBV BMLFl product. The 160-kDa gene of HVS lies at the right terminus of HVS L-DNA, its homologue in EBV occurring at the left terminus of the EBV genome (conventional orientation). The TS gene of HVS occurs within a group of 5 genes that have no homologues in EBV. The translation product of one of these genes, ECRFB, shows amino acid sequence and hydrophobicity pattern similarities to the HCMV and cellular G-protein-coupled receptor family of proteins. Another, ECLFZ, is homologous to the cyclin family of cellular proteins. The 5 nonconserved genes lie adjacent to the 160-kDa gene. In EBV, the region to the right of the 140-kDa gene (BNRFl) contains the latent replication origin (OriP) and the open reading frames BCRFl, BWRFl (repeated 12 times), BYRFl, BHLFl, and BHRFl, counterparts of which are not present in this position in HVS. The subsequent 18 genes in EBV (BFLF2 to BLRFO, approximate positions 56,000-89,500) are represented in HVS, and the relative positions and orientations of these genes are directly comparable between the two viruses. There then occurs a nonhomologous gene in HVS, and genes BLLF2 to BZLFl (positions 89,500 to 103,200) in EBV which are not present in this region of HVS, before collinearity resumes. Thus, the HVS sequence presented here shows general collinearity between conserved genes in the right terminal region of HVS and the left terminal region of EBV and reveals the presence of two sets of unique genes which occur in exactly analogous positions in HVS and EBV. o 1332 Academic PWSS. IX.

INTRODUCTION

structure to be made between the herpesviruses. Of particular importance has been the objective grouping, on the basis of sequence data of the viruses, into one of three subgroups ((Y, p, y) previously assigned according to differences in host range and tissue tropism during productive and persistent infections in viva and differences in host cell range and growth characteristics in vitro (Honess and Watson, 1977; Roizman, 1990). Thus, herpesviruses within a subgroup show greater conservation of gene structure (i.e., encoded amino acid sequence) and genomic organization of homologous genes between themselves than they do with herpesviruses from either of the other two subgroups. In addition to the genes conserved in all the sequenced herpesviruses, which presumably are essential replicative and structural genes, there appearto be genes specific to particular virus subgroups. For example, the neurotropic alphaherpesvirus VZV encodes homologues of the HSV immediate-early (IE) proteins IE-1 1OK and IE-175K (Davison and Scott, 1986; McGeoch, 1989) but equivalent genes are not present in members of the beta- or gammaherpesvirus subgroups. Certain noncoding features of herpesvirus genomes are also conserved and this is exemplified by the betaherpesviruses which show a local depletion of

Herpesviruses have large (loo-240 kbp) doublestranded DNA genomes which code for many gene products (typically >50) expressed during productive cycles of infection. The genomes of the human herpesviruses Epstein-Barr virus (EBV), varicella-zoster virus (VZV), herpes simplexvirus type-l (HSV-l), and human cytomegalovirus (HCMV) have been sequenced completely (Baer el al., 1984; Davison and Scott, 1986; McGeoch et al., 1985, 1986, 1988; Perry and McGeoch, 1988; Chee et a/., 1990a). Such sequence data, together with data from selected sequencing of other human and nonhuman herpesviruses, e.g., human herpesvirus(Lawrence et al., 1990; Martin et al., 1991; Neipel et al., 1991) and herpesvirus saimiri (Cameron et al., 1987; Gompels et a/., 1988; Nicholas et al., 1988; Albrecht and Fleckenstein, 1990) have allowed comparisons of genetic organization and gene

’ To whom reprint requests should be addressed at present address: Department of Oncology, Johns Hopkins University School of Medicine, 4 18 North Bond Street, Baltrmore, MD 2 123 1. The sequence data from this article have been deposited with the GenBank data library under Accession No. M86409. 0042-6822/92

$3.00

CopyrIght 0 1992 by Academic Press. Inc. All rfqhts of reproduction I” any form reserved.

296

DNA SEQUENCE

OF 43 kbp of HVS L-DNA

CpG dinucleotides, less than that expected from the G+C content, in the region of the major IE gene loci. Such a feature is not detected elsewhere in the genomes and is thus specific to these loci. The alphaherpesviruses show no such CpG dinucleotide suppression, whereas the gammaherpesviruses EBV, HVS, and MHV-68 show CpG suppression throughout their genomes (Honess eta/., 1989; Efstathiou er al., 1990). Such CpG suppression is indicative of methylation of CpG dinucleotides by cellular enzymes, with subsequent mutation of C to T, and reflects the fact that latent viral genomes are present in dividing cells and are exposed to methylation (Honess et al., 1989). Previous work from this laboratory (Cameron et al., 1987; Gompels et al., 1988; Nicholas et al., 1991) and others (Albrecht and Fleckenstein, 1990) has shown that the genomes of HVS and EBV are generally collinear, in that homologous genes are found in approximately equivalent locations and in the same relative orientations in the two viruses. However, it has also been reported that HVS encodes genes not found in EBV, e.g., thymidylate synthase (TS; Honess et al., 1986; Bodemer et al., 1986) dihydrofolate reductase (DHFR; Trimble et a/., 1988) and an HVS-specific IE gene, IE-G (Nicholas eta/., 1990). Therefore, although HVS and EBV are closely related in terms of biological properties and similarities between homologous genes and genomic organization of these genes, sequence data allowing determination of the extent of the genetic divergence between the viruses have not been available. The data presented in this report comprise an analysis of 43,658 bp of contiguous sequence from the right end (conventional orientation) of HVS L-DNA and demonstrate general conservation of gene sequence and organization between HVS and EBV. We also identify two loci of divergence in which “unique” genes are present in these gammaherpesvirus genomes.

MATERIALS Recombinant

AND METHODS

plasmids

Fragments of HVS L-DNA corresponding to Kpnl-E (Kpnl-Smal fragment from the right terminus of LDNA), EcoRI-E, and EcoRI-J (previously called EcoRI-I) were obtained from plasmid clones (Knust et a/., 1983b). The EcoRI-C fragment of the genome was derived from a clone comprising this region cloned into the EcoRl site of pACYC184. The right 5.9 kbp of EcoRI-D was subcloned as a BarnHI-EcoRI fragment into pUCl9 (to give pdD5.9) and a 4.7-kbp Pstl fragment spanning the unique BamHl site in EcoRI-D was subcloned into the Pstl site of pUCl9 (to give pdD4.7).

Ml 3 subcloning

297

and DNA sequencing

Nucleotide sequences of the cloned subgenomic fragments of HVS DNA were determined by the dideoxynucleotide chain termination method (Sanger et al., 1980, 1977) essentially as described by Bankier and Barrel1 (1983). Gel-purified subgenomic fragments were sonicated, end-repaired, and cloned into the Smal site of Ml 3mpl8. Recombinant Ml 3 clones (identified as white plaques on X-gal/lPTG L-agar plates) were propagated in 2-ml cultures of TG-1 cells, and single-stranded DNA templates prepared as described previously (Bankier and Barrell, 1983). Dideoxy sequencing from these templates was done with a 17. mer universal primer and with [35S]thio-dATP (>400 Ci/ mmol; Amersham, UK) as the labeled nucleotide, and the products were fractionated on Tris-borate-EDTA buffer gradient polyacrylamide gels (Biggin et a/., 1983). Assembly

and analysis of the DNA sequence

Sequences were assembled using the DB programs of Staden (1982), DBUTIL and DBAUTO, and the corrected consensus sequence (43,658 bp from the left Pstl site in EcoRI-D to the Smal cleavage site at the right terminus of Kpnl-E) was analyzed using the program packages of ANALYSEQ (Staden, 1984a,b), Genetics Computer Group (GCG; Devereux et a/., 1984), and Molecular Genetics and Sequencing (MGS, developed by B. Greer and P. Gillett, Computing Laboratory, National Institute for Medical Research, London). Protein compositional analyses were undertaken using options in the MGS suite of programs. For database homology searches, pairwise amino acid sequence alignments, and structural analyses of proteins, the GCG suite of programs was used. All homology searches were performed using the FASTA program (Pearson and Lipman, 1988) and amino acid sequence alignments were made using the GAP option in the GCG package (Needleman and Wunsch, 1970) with gap and length weights set at 5.0 and 0.3 or 3.0 and 0.1, respectively. Positional base preference and codon usage analyses were undertaken using the ANALYSEQ programs (Staden, 1984a,b). RESULTS Sequencing frames

and identification

of open reading

The sequence of the rightmost 43.6 kbp of HVS LDNA was obtained from the Kpnl-E, EcoRI-C, EcoRI-E, EcoRI-J, and EcoRl-D (right 8.3 kbp Pstl-EcoRI portion) cloned subgenomic fragments (Knust et al,, 1983b) and assembled as described under Materials and

298

NICHOLAS

Methods. Open reading frames were generally considered significant if they were more than 300 nucleotides in length (i.e., coding capacity for 100 amino acids). Due to the A+T-rich nature of HVS L-DNA (average 34.5% G+C) the likelihood of open reading frames of >300 nucleotides occurring by chance is relatively low. The probability of reading frames being proteincoding was assessed by carrying out positional base preference and codon usage analyses using ANALYSEQ (Staden, 1984a,b). For codon usage analysis, standard codon frequency tables were derived from the HVS 160-kDa gene (Cameron et a/., 1987) and the HVS IE 52-kDa gene (Nicholas et al., 1988). The high A+T content of HVS L-DNA leads to accumulation of A and T in the third position of the codons, i.e., the degenerative position, within protein-coding regions. In noncoding regions such bias toward the third position of codons is not observed since there is no restriction on the distribution of these bases as protein amino acid sequence will not be affected as a consequence. Generally, the resulting codon usage plots provided supporting evidence that the reading frames identified were indeed protein-coding (data not shown). Exceptions to this which did not show homology with EBV genes were ECLF2, ECLF3, and intragenic or extensively overlapping reading frames (e.g., reading frames in EDRFl, ECRFl, and EERF2, and those overlapping ECLF2-ECLF3 and ECLF5-ECLFG; not shown). Figure 1 summarizes the results of the sequence analyses, showing the 27 complete and 1 incomplete (EDLF5) reading frames and the relative positions of the homologous genes in EBV. Consistent with previous findings from the sequencing of selected regions of the HVS genome (Honess et a/., 1989; Albrecht and Fleckenstein, 1990), the sequence presented here has a global underrepresentation of CpG dinucleotides (see Fig. 2), as is the case for the gammaherpesviruses EBV (Honess eta/., 1989) and MHV-68 (Efstathiou eta/., 1990). Comparison

of HVS and EBV genomes

All of the reading frames identified are shown in Fig. 1 and their positions and homologues in EBV, HSV, VZV, and HCMV are indicated in Table 1. Reading frames ElLFl (encoding the HVS 160-kDa protein), ECLF4 (encoding thymidylate synthase), EJRFl (major reading frame for the IE 52-kDa transactivator), EDRFl (HVSR reading frame), EDRF2, and EDLF5 have been described previously (Cameron et a/., 1987; Honess et a/., 1986, Bodemer et a/., 1986; Nicholas et al., 1988, 1991). In Fig. 1, the reading frames are shown aligned against their EBV counterparts. Of particular interest is the divergence of the HVS and EBV genomes in the region of the EBV OriP and EBNA coding sequences

ET AL.

(BWRFl and BYRFl reading frames) toward the left end of the EBV genome and in a second region between BLRF2 and BRLFl which contains EBNA reading frames BLRF3, BERFl, BERF2a, BERF2b, BERF3, and BERF4. The latter region also includes EBV sequences (BZLFl) encoding the trans-regulatory protein EBl (Z) which can disrupt EBV latency (Baer et al., 1984, Countryman and Miller, 1985; Rooney et a/., 1989; Takada et al., 1986). There is no HVS homologue of BZLFl but a homologue of the adjacent BRLFl reading frame is present in HVS (EDRFI ; Nicholas et a/., 1991). At the positions of divergence between the two gammaherpesvirus genomes, HVS contains five open reading frames, ECRF3, ECLFl, ECLF2, ECLF3, and ECLF4, between positions 31,987 and 39,477, and one open reading frame, EDRF2, between positions 3096 and 4163. Database searches for homologues of these reading frames have revealed that ECLFl shows sequence similarity to various proteins, including neurofilament M protein (Levy et al., 1987) and the 91.8K protein (gene 1 1 product) of VZV (Davison and Scott, 1986). However, ECLFl contains imperfect repeat sequences of between 10 and 15 nucleotides which result in a translation product rich in glutamate and alanine residues between amino acid positions 60 and 240. The sequence similarities between ECLFl and apparent homologues occur only over this region and thus appear to be a consequence of repeated glutamate-rich amino acid sequences rather than indicative of homologous protein sequences (see Fig. 3A). ECLF2, however, shows significant amino acid sequence similarity to cyclins from various organisms (Swenson et al., 1986; Pines and Hunt, 1987; Minshull et a/., 1989, 1990; Murray and Kirschner, 1989; Lehner and O’Farrell, 1989, 1990; Wittenberg et a/., 1990; Matsushime et al., 1991; Xiong et al., 1991), and illustrative examples are shown in Figs. 3B and 3C. It is important to note that the HVS cyclin, while not as similar to the cellular cyclins, types A, B, C, and D, as these are to themselves, is most closely related to the D-type cyclins, in terms of both length (see Fig. 3C) and amino acid sequence (Nicholas eta/., 1992). ECRF3 encodes a protein that shows appreciable amino acid sequence similarity to HCMV US27, US28, and UL33 gene products (Chee et al., 1990a,b) and to cellular members of the G-protein-coupled receptor (GCR) family of proteins (Nicholas et a/., 1992). Alignments of the ECRF3 and US28 proteins are shown in Fig. 3D. This homology and that between ECLF2 and the cyclins are discussed more fully below. At the first point of divergence between HVS and EBV (nucleotide position 3096 to 4163 of the HVS sequence), HVS encodes a reading frame, EDRF2, which is not conserved in EBV. This HVS reading frame is,

SLLFZ

loo

BSRFl

K 6ERFZbKlc BERF2a BERFJ

BLRFl

I

95

BERF,

mm BLRF3

BSLF2 I % SMLFI)

90

BLLFZ’

BMRFI ( K I< K BMRFZ EaRF1

EJRFI

80~~2

1 I< J BORF\

BOLFl

:EjFIEEAF’



3

EERFZ

55

,

a

I

BHAFl

BPLFl

ECLF7

ECRFt I-))

BFRF3

a

,lmD

ECLFO

m

ECRFZ

BFRFZSFRFl

50

>ca

ECLFS

ii

lenLFI)

(K

) ECLFB

BFLFZ

ECLF3

ECLFl

ECLFP

M-

12

(

DC

ECRFJ

a

BCRFt

BNRFl

1

7

I

ElLFl 1

FIG. 1. Genetic organization of the rightmost 43.6 kbp of HVS L-DNA. The L-DNA component of the HVS genome (conventjonal orientation) is represented as restriction endonuclease cleavage maps for EcoRI, Kpnl, and Xhol with the position of the sequenced region (expanded) shown below. It is important to note that the EcoRl fragments I and J are labeled as such on the basis of the current sequence data (i.e., EcoRI-l is larger than EcoRI-J), and these new assignments represent a change from restnction maps published previously (e.g., Knust et al., 1983; Nicholas et al., 1992) which were based on estimates of fragment sizes determined by agarose gel electrophoresis. The HVS reading frames are represented as arrows (indicating direction of transcription) that are either open, to indicate homologues in EBV, or solid, to show that these genes do not have equivalent representatives in EBV. Nomenclature of reading frames follows the convention used for EBV reading frames (Baer et al., 1984) and is based on the EcoRl cleavage map. Thus EDLFI indicates the first leftward (L) reading frame (F) initiating in the EcoRI-D (ED) fragment of the genome, whereas EDRFl is ascribed to the first rightward (R) reading frame in f?coRI-D. (EDRFI is known to be the first rightward reading frame in EcoRI-D from the results of other sequencing studies; B. Blesinger and 3. Fleckenstein, unpublished results.) Below the HVS reading frames the corresponding genes in EBV (Baer et al., 1984) are shown. For this comparison, the appropriate segment (left 1 10 kbp) of the EBV genome has been reversed, and the numbering reflects this. The two segments of the EBV genome (positions 7000-55,000 and 89,500-103,200) shown below correspond to regions without counterparts in HVS. Thus, EBV genes BCRFl to BHRFI and BLLF2 to BZLFl do not have homologues In analogous positions in HVS.

BZF! IDmm

Analysis of nucleotide sequence of the rightmost 43 kbp of herpesvirus saimiri (HVS) L-DNA: general conservation of genetic organization between HVS and Epstein-Barr virus.

We present an analysis of 43,658 bp of contiguous nucleotide sequence comprising the right terminal region (conventional orientation) of the unique pr...
2MB Sizes 0 Downloads 0 Views