Gem,. II8 0
(1999) IIWII3
1997 Elscvicr Scicncc
The DNA SPOl (Recombinant
Vincenzo
Puhliahers
B.V. :+.I1rights raewed.
polymerase-encoding
DNA;
replication;
109
0378-l 119~92~S~l5.00
gene of Bacillus
amino acid sequence;
sequence
homology;
subtilis
self-splicing
bacteriophage
group-1 intron)
Scarlato ‘L and Silvana Gargano a
SUMMARY
The bacteriophage SPOl DNA polymerase-encoding gene, which contains a self-splicing intron, has been sequenced its amino acid (aa) sequence has been deduced. The aa sequence of SPOl DNA polymerase shows a high degree of ilarity with that of DNA polymerase I from Escherichicl cnli (PolI). Alignment with the sequences of PolI, and the $29 SPOl DNA polymerases indicate that the aa residues that have been implicated in 3 ’ 4 5’ exonuclease activities are served.
INTRODUCTION
During Bucillus subtilis bacteriophage SPO 1 development, succcssivc phage-coded modifications of the host transcriptional apparatus control three phases of gene expression: early, middle, and late. This temporal pattern of SPOl transcription is controlled by three regulatory genes (genes 28, 33 and 34) whose products code for RNA polymerase-binding proteins, and by a fourth gene (gene 27) that codes for a protein involved in replication (Heintz and Shub. 1982: Greene et al., 1982: Geiduschek and Ito, 1982). Upon infection, early expression starts within 1 min, middle expression starts within 3-4 min (at 37 a C), whereas late cxprcssion starts at the onset of DNA replication which
CJ~,-rcporltlrncp r/j;Dr. V. Scarlato
at his permanent
address:
biological Rcacarch Institute Sicna (I.R.I.S.). Via Fiorentina Sicna, 1~11~.Tel. (39.577)293239; Fax (39-5773293561. Abbreviations:
aa, amino acid(s);
aide triphosphatc:
bp. base pair(s); dNTP,
gp, gcnc product(s);
HMU,
Immuno1, 53100
dcoxynucleo-
hydrowymethyluracil:
kb.
kilobasc(s) or 1000 bp: nt, nucleotide(s); ORF, open reading frame: PolI, E. w/i DN.4 polymcrasc I: Pollk. Klcnow (la-g) fragment of E. co/i DNA polymcraac
I: RBS, ribosomc-binding
fate: ::. novel junction.
site(s); SDS, sodium dodql
sul-
and simand con-
occurs at 10 min. Phage SPOl DNA synthesis depends on the activity of nine viral gene products (Okubo et al., 1972). Mutants in genes 21-23 and 27-32 fail to synthcsizc phage DNA under nonpermissive conditions. The gene 28 protein (gp28), is a member of the o-family of transcriptional initiation factors (Gribskov and Burgess, 1986). Middle transcription is driven by gp28 attached to the RNA polymerase core. SPOl DNA contains HMU in place of thymine, and the products of genes 23 and 2Y probably represent enzymes involved in HMU biosynthesis (Glassberg et al., 1977a.b). It has been proposed that the products of genes 21 and 32 could be involved in the initiation step of SPOl DNA replication, whereas gp22. gp30 and gp31 would be required for elongation (Glassbcrg et al., 1977a,b). Sequence analysis of gene 30 revealed similarity with the exonuclease of bacteriophage T4, gp46, which is involved in DNA replication, and with the replication protein P of phage A (Scarlato and Sayrc. 1992). Biochemical studies have provided evidence that the bacteriophage SPOl DNA polymerase is coded for by gene 31 (De Antoni ct al., 1985). This DNA polymerase is unusual in that it is the first DNA polymerase containing a self-splicing group I intron (Goodrich-Blair et al., 1990; Goodrich et al., 1989).
110 13
19 26 I II
I
2015
25 II
16
I
II
12
1
I
17
I
13
2
I
,'
,'
I
6
8
.'
I
,-
,'
,'
,'
.'
23
,'
II
Zl?
‘sx,,
3 ,,--ri ,' ,'
24
III
II
I
14 13
'\\
I
7
15
I
II
18
26 10
\ '\ '\ '\ \ '\
lOkb
lkb
I
t Fig.
I. Phgwal
location
of SPOl
DNr\ pol~mcrasc-encoding
gene .?I. (Upper part) &oRI
part) Expansion
of SPOl region coding for gcnc 31. Open boxes represent tions of the mapped middle promoters P,,,, 8-10 arc indicated bq upward phagc (provided EwRI-PwII
by E.P. Gclduschck.
fragment.
adjacent
Universit]
to the EcoRI-23
of Cahfornia fragment,
map of the approx.
IA&kb SPOI gcnome.
at San Diego. La Jolla. CA), and cloned into the plaamid
was cloned in pUCX as described
of subclones was gcncratcd by using suitable internal restriction isolation of plasmids DN.4 were E. co[i HB 101 and DHS
(Goodrich
sites, or by using BAL 31 nuclease
The aim of the present study was to determine the nt and the aa sequences of SPOl DNA polymerase, and to compare the aa sequence with those ofother DNA polymerases.
EXPERIMENTAL
restriction
AND DISCUSSION
(a) The nt and predicted aa sequences of SPOl DNA polymerase Marker rescue of an amber mutation in gene 31 was observed upon recombination with a plasmid containing the restriction fragment EcoRI-23 (Curran and Stewart, 1985). This DNA fragment (Fig. 1) is adjacent to the larger SPOl genomic fragment EcoRI-9 in which a self-splicing group-I intron was identified (Goodrich-Blair ct al., 1990; Goodrich et al., 1989). Analysis of the nt sequence of EcoRI-23, and of the flanking restriction fragments. shows an ORF that starts just downstream from the middle promoter P M,,8(Greene et al., 1984; Scarlato et al., 1991) and terminates within the unspliced intron (Goodrich-Blair et al., 1990). A separate 522-bp ORF lies within the intron. Ligation of the spliced message brings together two exons that constitute the DNA polymerase gene (Fig. 1). The nt sequence of the SPOl DNA polymerasc gene and the flanking sequences are shown in Fig. 2. The calculated M, of the 924-aa sequence predicted for the SPO 1 DNA polymerase is 106808, which is in agreement with the reported size (105 kDa) estimated by SDS-polyacrylamide gel elec-
(Lower
gcnc 31 cxons and shaded arca the intron. TranscrIption ib rightward. Locnarro\?s. The 1632.hp E‘crlRI-23 fragment was purified from a i recombinant vector. pUC18.
et al.. 1989: Goodrich-Blair
digestions.
Bacterial
The 2X00 bp
et al.. 1990). 4 xt
straitis used for propagation
and
trophoresis (De Antoni et al., 1985). This protein contains 23”,, of acidic, 16”, of basic, and 35”, of hydrophobic aa residues. The start codon ATG for the SPOl DNA polymerasc gene is preceded by a region complementary to the 3’ terminus of B.suhtilis 16s rRNA (Moran et al., 1982). A similar RBS has been found to overlap the stop codon of the gene 31 ORF. Eight bp downstream from this RBS there is a possible start codon for an unidentified ORF. (b) Comparison of aa sequences between phage SPOl DNA-polymerase and PolI of Escherichia coli A search of the Genetic Computer Group data base (Devereux et al., 1984) showed the SPOl DNA polymcrase to be strongly homologous to PolI. The aa sequence alignment of SPOI DNA polymerase and PolI (Fig. 3) showed that 53.04”” of aa residues are conserved (28.73”,, identity). The majority of the identical residues (total of 36.8”,,) were found in the C-terminal portion of the two proteins. In this region, residues Lys”‘. Tyr’“” and His”“’ of PolI were identified as being implicated in dNTP-binding (Joyce and Steitz, 1987; Pandey et al., 1987). These three rcsiducs are conserved in SPOl DNA polymerase (Fig. 3) as well as in bacteriophages T5 and T7 DNA polymerascs (Lcavitt and Ito, 1989). Comparison between a number of DNA polymerase aa sequences has suggested that there may be several distinct evolutionary groups of DNA polymcrases. Based on primary structural similarity, the SPOl DNA polymcrase belongs to a group that includes PolI, bactc-
111 -35 . .-IO . _______ . ACATGGGTGTCAAGTTGTGGTGAATCAGTTTCTTTTTAGTATCAAGC~GGAGTGTTGTTAATG~MGTGCTTTAGACACGCTAAA~MTTCAATCCTAAGCCTATGMG~AAG -______ --____ MGSALDTLKEFNPKPMKGO
120
~A~AMAAAG~TA~TTATMTCGTC~AAGAAA~CTTTTGACTATGAATACCGTAAGAAGAAGTATATGACAGGMAAGCAGG~CTTTTAAAGTTTGGGCTA~AGAAGT~
240
19
59
TGKAGKLLKFGLAEV
GSKKARIIIVOENPFDYEYRKKKYM
360
GAATAGACCCAGATGAGG~TGTGTACTACACTTCAATTGTTAAGTAC~AACACCAGAGAATAGATTA~AACACCAGATGAGATCAA~AGTCTATGGATTATATGTG~AGAGATAG
99
GIDPDEDVYYTSIVKYPTPENRLPTPDEIKESMDYMWAEI
480
AGGTAATCGATCCTGATATCATCATCCCTACAGGTAATTTATCCTTGMGTTTTTAA~AAAATGACA~ATTACTAA~TTAGG~~GTTATATGAGATAGA~GTAGAAAATTCT
139
TKVRGKLYEIEGRKF
EVIDPDIIIPTGNLSLKFLTKMTAI
TCCCTATGATTCATCCTMTACAGTGCTCAAACAGC~AAATATCAGGACTTCTTTATTAAGGACCTTGMATATTGGCATCCTTATT~MGGAAAAACACCTAAGAATGTTCTAGCGT
600
FPMIHPNTVLKCIPKYODFFIKDLEILASLLEGKTPKNVLA
179
TTACAMOGAAAGACGATACTGTGAT~TTTGA~AT~TATTGATGAGATCAAGAGATACTTAGA~TTCCAGCAGGTTCTAGAGT~TTATCGACTTAGAGACTGTTMGACCAACC
720
FTKERRYCDTFEDAIDEIKRYLELPAGSRVVIDLETVKTN
219
CTTTTATTGAAAAAGTMCTATGAAG~ACGACTTTAGAAGCTTATCCAATGAGCCMCAGCCTAAGATTGTTGGTATCGGGTTATCTGACAGGTCTGGTTATGGATGTGCGTACCCT
840
PFIEKVTMKKTTLEAYPMSOQPKIVGIGLSDRSGYGCAIP
259
TATATCACAGGGAAAATCTTATGAAGGOTAACCAGATAGGAACCATTGTAAAATTTCTMGAAAGTTACTAGAAAGGGMGATTTGGAGTTTATCGCACATAATGGTAAGTTTGATATAA
960
LYHRENLMKGNQIGTIVKFLRKLLEREOLEFIAHNGKFDI
299 1090 339
AGACTGATATG~TGGTTATGATGATOCCCTTGATGGTGAAAAACCT~GGGGAGGATGAAGGTAATTACGACTTGATACCCTGGGACATACTAAAAGTGTATCTTGCAGATGACTGTG
1200
ETDMGGYDDALDGEKPKGEDEGNYDLIPWDILKVYLADDC
379
ATGTGhCTTTCAGATTGTCAGAGAAATATATACCTTTGGTTCCTAGACATAGAAATGG
1320
OVTFRLSEKYIPLVEENEEKKWLWENIMVPGYYTLLDIEM
419
ATGGCATACATGTTGATAGGGAATGGCTTGAAGTTTTMGAGTTTCCTATGAAAAGG~ATTTCTAGACTTGAGGACAAGATGAGAGAATTTCCTGAGGGTGTCGCTAT~AGCGTGAAA
1440
DGIHVDREWLEVLRVSYEKEISRLEDKMREFPEGVAMERE
459
TGAGGGACAAGTGGA~AAAGAGT~ATGA~AGGTA~AT~AAGTC~~TAATAGAACACC~GAGCAA~~GA~AAGTTCAAGAAATATM~AAATATGA~~AT~TAAA~~~TGGGGATA
1560
MRDKWKERVMIGNIKSANRTPEOQDKFKKYKKYDPSKGGD
499
AGATTMCTTTGGTAGTACTAAACAACTAGGAGAGCTATTGTTTGAGAGAATGGGATTAGAGACTGTTATTTTTACTGACAAAWjGGCACCAAGTACTAATGATGACTCCCThAAGTTTA
1680
KINFGSTKOLGELLFERMGLETVIFTDKGAPSTNDDSLKF
539
:: GCTACMCATACATGGCACTGTGACAGGTCGTTTGAGTAGTAATGA~TAACGCTC~AATTCCCA~TAAGGTGAACACG~CAACATTATT~~AGTATAACTTTGAGATTAAGAAAA
1920
SYNIHGTVTGRLSSNEPNAQQFPRKVNTPTLFQYNFEIKK
619
TGTTT~CTCTAGGTTT~GGAT~T~TGTAATTGTA~AGTTTGA~~ACTCTCAG~TM~AGTTACGTA~AC~TGT~TGTTA~TA~T~MGA~C~~A~ACTATTGA~C~GTACAGA~CAG
2040
MFNSRFGDGGVIVQFDYSQLELRILVCYYSRPYTIDLYRS
659
GAGCTG~CTTGCATAM~TGTA~~TCTGA~G~A~TTGGTGTAG~~ATTGAAGAGGTMGTAAAGA~AG~GGA~AG~MG~AAGAAGATA~AGTTTGGTAT~GTTTA~AAGAG~~TG
2160
GADLHKAVASDAFGVAIEEVSKDORTASKKIOFGIVYQES
699
CA~GA~~TTTATCTGMG~CCTGCGG~AGA~TATCACTATGAGTGMGATGAATGTGAAATCT~CATCAAGAAG~ACTTTAAG~GATTCCCTAAGGTTAG~AAGTGGAT~AGAGATA
2280
ARGLSEDLRAEGITMSEDECEIFIKKYFKRFPKVSKWIRD
739
CCAAAMGCATGTTAMGACATAAGTACGGTTAAGACTCTCACC~A~TACTAGAAACCTACCTGATATTGACTCTATAGATCAGTCTMGGCAAATGAGGCAGAACGTCAGGCGGTTA
2400
TKKHVKDISTVKTLTGATRNLPDIOSIDQSKANEAEROAV
779
ATACTCCTATTCAAOGMCAGGCTCTGACTGTACACTMTGTCTCTAATCCTCATCAATCAATGGTTMGAGAGTCTG~TTAAGAAG~TATCTGTATTACAGTTCATGACTCCATTG
2520
NTPIOGTGSDCTLMSLILINOWLRESGLRSRICITVHDSI
819
TACTAGACTGTCCTAACWTGAAGTATTAGAGGTTGCTAAGAAAGTThMCAThTCATffiAGAACTTAGGAGAATATAATGAGTTCTAThMTTCCTTG~ACGTACCMTCCTCAGTG
2640
VLDCPKDEVLEVAKKVKHIMENLGEYNEFYKFLGDVP
I
L
s
AAAT(jCAGATTCioAAG~CTATGGG~T~TTTTGMW3TACTATTGMGATATAGMGAACATGGAGTAGATGGTTTCATTGAAATGMGAAAAAGAGAAGCTTGA~GGATATGA
959 2760
EflEIGRNyGDhFEhTIEDIEEHGVDGFIEnKEKEKLEKDn
899 __-----
AAGAGTTTACTAAGATTATTGAAGAT~TGGATCAATACCACCTTGATTAAGGGGTTG
2880
KEFTKIIEDGGSIPOYARIYWENIS*
MKODTLIKGL 924
Fig. 2. The nt and aa sequences Goodrich-Blair
of the DNA polymerase
gene. Sequences
from bp l-90
and from 17 15- 1989 were taken from Greene et al. (1984) and
These sequences and the sequence of the EcoRI-23 fragment were assembled to yield the sequcncc of gene 31. The -10 and -35 DNA regions of the P,,, 8 middle promoter (see Fig. 1) arc underlined. RBS are overlined. The junction of exon 1 and exon 2 is indicated by double colons (::). Both strands were sequenced by the method of Sanger et al. (1977) either using PolIk (Boehringcr Mannheim Biochemicals),
or Sequenase
et al. (1990), respectively.
(US Biochemical
the DNA sequencing formed by standard M84415.
Corp.. Cleveland.
was carried methods
OH). In the cases in which the generated
out with synthetic
(Maniatis
oligodeoxynucleotides
et al., 1982). This sequence
as primers
subclones
(Applied
were too long or the sequence
Biosystem
data will appear in the GenBank
Synthesizer).
Nuclcotide
Sequence
resolution
DNA mampulations Database
was poor, were per-
under accession
No.
112
2 GSALDTLKEFFPKPMKGQGSKKARIII”OENFFDYEY--RKKKYMTGKAGKL-------~ I. .:.I. :.. I. *.*. I :: ..:..I I. : I......::., 37
GNlYGVLNNLRSLIMOYKPTHMVVFDhKGKTFRDELFEHYK~PPnPDDLRAOIEPLH
52
-----LKFGL~VGIDPDEDVWTSIVKYPTPENRLPTPDEIKESnDYnWAEIEVI----
97
hMVKhMGLPLLhVSGVEADDVIGTLAREAEKA~PVLISTGD~AOLVTPNI~INT~T
:
:.I
.I:
..:11.
.
I
..::..:
. . . .
103
----DPDIII----PTGNLSLWLTKMTAITKVRGKLYEIEGRKFFPMIHP----NTVLK
157
NTILGPEEVVN(YGVPPELII~ALMGDSSa‘l[PGVPGVGEKTADALLQGL~DTLYA
151
~-KYDDFFIKIXEILASLLEGKrPKNVLAF~RRYCD---------------TFEDAI
217
EPEKI~GLSFRG~KTMAAKLE~EVAYLSYMTIKTDVELELTCEOLEVOCPAAEELL
195
~~KRY_----_LELPAGSRV---___________--_-______________---V~DL
:I:
:I
::
I..::
:
. . ..I
:::
..:I.
:,:I
:.Il.
I..
il....
.
. .
I.:
.
:
.
:.:
I:
::::.
:::::.
.
ETVKTNPFIEKVTMKKTTLEhYPMSOO------PKIVGIGLSCRSGYGCAIPLYHRENLM ::I.1 :.:11:::. II:1 :.I..: I:. . :
337
ETLK--AWIAK-IEKAPVFAFDTETDSLDNISANLVGLSFAIEPGVAAYIPVAH-DYLD ** *
268
KGNOIG--TIVKRRKLLEREDLEFIAHNGKFDIRWLRhSL----1H :::I ,:I .:I I: :.:I:.111 I.
392
hWDISRERAlELLKPLLEDEKNKVGONLKM).RGILANYGIELRGIAFDTMlSYILN * *
322
IIDYRGERYSlU%RLANLETDmj(3YDDALliGEKPKGEDEGIPWDI;KVYLADDCD~ . . . . ..I .Il.:: .:: :. I :. l:..l: . .I
451
SVAGRHDNDSLAERWLKHKT------ITFEEIAGKGKNOLTFMIALEEAGRYMEDADV + *
,
:
.I
I..:1
:.
I
505
nOLHLKNWPCLCKHKGPLNVFENIEMPLVPMSRIERNGVKI~KVLHNHSEELTLRLA
442
FhEDKMREFPEGVAMEREMRDKW