Vol. 58, No. 2

INFECTION AND IMMUNITY, Feb. 1990, p. 480-488 0019-9567/90/020480-09$02.00/0 Copyright ©D 1990, American Society for Microbiology

Molecular Characterization of the Clostridium difficile Toxin A Gene C. H. DOVE, S.-Z. WANG, S. B. PRICE,t C. J. PHELPS, D. M. LYERLY, T. D. WILKINS, AND J. L. JOHNSON* Department of Anaerobic Microbiology, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061 Received 10 July 1989/Accepted 21 October 1989

The gene encoding the toxin A protein of Clostridium difficile (strain VPI 10463) was cloned and sequenced. The coding region of 8,133 base pairs has a mol% G+C of 26.9 and encodes 2,710 amino acids. The deduced polypeptide has a molecular mass of ca. 308 kilodaltons. Nearly a third of the gene, at the 3' end, consists of 38 repeating sequences. The repeating units were grouped into two classes, I and II, on the basis of length and the low levels of DNA sequence similarities between them. There were seven class I repeating units, each containing 90 nucleotides, and 31 class II units, which, with two exceptions, were either 60 or 63 nucleotides in length. On the basis of DNA sequence similarities, the class II repeating units were further segregated into subclasses: 7 class IIA, 13 class IIB, 5 class IIC, and 6 class IID. The dipeptide tyrosine-phenylalanine was found in all 38 repeating units, and other amino acid sequences were unique to a specific class or subclass. This region of the protein has epitopes for the monoclonal antibody PCG-4 and includes the binding region for the Galoa1-3Gal01-4GlcNAc carbohydrate receptor. Located 1,350 base pairs upstream from the toxin A translation start site is the 3' end of the toxin B gene. Between the two toxin genes is a small open reading frame, which encodes a deduced polypeptide of ca. 16 or 19 kilodaltons. The role of this open reading frame is unknown.

Clostridium difficile is the major causative agent of pseudomembranous colitis in humans (2). The organism produces two toxins, designated toxin A and toxin B (1, 32, 33). They are both cytotoxic and lethal for animals, although toxin B is about 1,000-fold more cytotoxic than toxin A for most cell lines. Both toxins appear to be produced in all toxigenic strains; however, the toxicity of strains may vary by several orders of magnitude (20, 32). The actions of these toxins appear to be quite complex and at present are not understood. Although toxin A has a direct toxic effect on the intestinal mucosa, toxin B does not cause a significant response when given intragastrically to hamsters, unless it is initially mixed with a small amount of toxin A (19). Alternatively, toxin B is also toxic if it is given to hamsters with bruised (injured) ceca. The results are consistent with the initial binding and primary tissue damage being caused by toxin A or by mechanical injury, followed by the entry of toxin B. Investigators in a number of laboratories have worked on the isolation and physical properties of the toxins (1, 2, 27, 30, 32). Both toxins have been purified to homogeneity (18, 32). An interesting and controversial property of the toxins has been their molecular weights. Initial molecular weight estimations obtained by using native proteins have ranged from 440,000 to 600,000 for toxin A and 360,000 to 500,000 for toxin B (1, 30, 32). However, in later studies there has been controversy as to whether the toxins dissociate into smaller subunits under denaturing conditions. Under these conditions, size estimations for the toxins range from 300,000 (18) to 50,000 (27, 30) and down to 42,500 and 16,000 (29). These discrepancies have been summarized by Lyerly et al. (16) and are difficult to explain (35). Perhaps in some of the isolation procedures a smaller contaminating protein copurified with the toxins and tended to mask the toxins in the polyacrylamide gels. The best approach to resolve this controversy is to clone and sequence the toxin genes.

Several investigators have begun cloning these genes. Muldrow and his collaborators (26) have reported the cloning of a 0.3-kilobase-pair (kb) fragment of the toxin A gene in the lambda bacteriophage expression vector gtll. The expressed peptides reacted with toxin A polyvalent antisera. When the cloned fragment from toxin A was used as a labeled probe, it reacted with a PstI-generated fragment of C. diJficile DNA, which they estimated as 4.5 kb. We have cloned a 4.7-kb PstI fragment into a plasmid vector (28). This fragment has an internal PstI site which is protected from digestion in the C. difficile DNA. When this fragment is expressed, the peptide reacts with both toxin A affinitypurified polyclonal antisera and with the monoclonal antibody PCG-4 (17). Preliminary results on the sequencing of this fragment have shown that there are many repeating sequence units within the fragment (14). Eichel-Streiber et al. (8) have recently cloned portions of the 4.7-kb PstI fragment into a plasmid expression vector and obtained an expression product that also reacted with toxin A antisera. Wren et al. (36) have reported the cloning of toxin A in lambda phage. The clone expressed a protein that caused elongation of Chinese hamster ovary cells, and this protein had an estimated molecular weight of 235,000. In this study, we have completed the cloning and sequencing of toxin A and its flanking regions. Nearly one-third of the gene (from the 3' end) consists of a series of repeating units which appear to code for the receptor portion of the toxin. MATERIALS AND METHODS Bacteria, bacteriophages, and plasmids. DNA isolated from C. difficile VPI 10463 was used for cloning. Plasmids pBR322, pUC18, and pUC19 were used for the primary cloning of C. difficile DNA fragments. Subclones in the M13 phages mpl8 and mpl9 were used for DNA sequencing. The plasmids, phages, and Escherichia coli host strains JM109, DH5ot, and DHScF' were all obtained from Bethesda Research Laboratories, Inc. E. coli strain Chi 1776 was purchased from the American Type Culture Collection. Enzymes and radiolabeled compounds. Restriction endonu-

Corresponding author. t Present address: Department of Microbiology and Immunology, *

University of Kentucky Medical Center, Lexington, KY 40536.

480

VOL. 58, 1990

CHARACTERIZATION OF THE C. DIFFICILE TOXIN A GENE

S

H

Hc

PX IEV

El

H

P

P

El

P

EV

X

Hc

481 H

11 I Toxin A

-->

pCD17

pCD11 pCD11L

-i

pCD11R-6

pCD19

-i

-2

-1

0

1

2

3

4

5

6

7

8

kilobases

FIG. 1. Partial endonuclease restriction map of the cloned toxin A region from C. difficile strain 10463. Also shown locations of primary clones pCD11, pCD11L, pCD11R-6, pCD17, and pCD19.

clease enzymes were purchased from Bethesda Research Laboratories, Inc., International Biotechnologies, Inc., or Promega Biotec. Endonucleases III and VII, E. coli DNA polymerase I, and the polymerase I Klenow fragment were obtained from Bethesda Research Laboratories, Inc. DNA ligase and calf alkaline phosphatase were purchased from Boehringer Mannheim Biochemicals. The T7 DNA polymerase, Sequenase, was purchased from U.S. Biochemical. The labeled nucleotide triphosphate [a-35S]dATP was obtained from New England Nuclear. Random primer labeling kits were obtained from International Biotechnologies, Inc. All of the enzymes were used according to the instructions provided by the manufacturers. DNA isolations. High-molecular-weight C. difficile DNA was isolated by using a variation of the Marmur procedure (13, 22). The harvested cells (12) were suspended in 50 mM Tris-1 mM EDTA buffer (pH 8.0). After 50 ,ug of lysozyme per ml was added, the cell suspension was incubated at 37°C until the cells were susceptible to sodium dodecyl sulfate disruption. At the time of disruption, EDTA was added to a final concentration of 40 mM, proteinase K was added to a final concentration of 30 to 50 ,ug/ml, and ,B-mercaptoethanol was added to a concentration of 1% to inhibit endogenous nuclease activity in the lysate. Plasmid DNA and the replicating-form DNA of M13 phage were isolated by the Birnboim and Doly alkaline lysis procedure (4). DNA preparations used for probe fragment generation and nested deletions were further purified by CsCl centrifugation. Specific DNA restriction fragments were separated from others on low-melting-point agarose, and the individual bands were cut from the gel for use in the random priming labeling procedure. When restriction fragments of C. difficile DNA were needed in a particular size range (i.e., for cloning by chromosome walking), the gel was cut at the lower size range and this part was removed. A well was then cut into the gel at the upper size range, the polarity of the electrophoresis unit was reversed, and the fragments were electroeluted into the well. This tended to concentrate the fragments entering the well and resulted in a lower eluate volume. The fragments were then ethanol precipitated. Primary cloning. Of the four primary clones used in this study (Fig. 1), three were cloned by the chromosome walking approach. The 2.6-kb PstI fragment of pCD11 was used as a probe for cloning pCD11L, the 0.5-kb HindIII-EcoRI fragment of pCD11L was used as a probe for cloning pCD17,

are

the sizes and

which was then used as a probe to clone pCD19, and pCD11 was used to detect clone pCD11R-6. The cloning was carried out under EK-2, BL-2 containment with E. coli Chi 1776 as host. Cells were made competent by the Hanahan procedure (11, 21). Toxicity assays. Lysates from each primary clone were checked for animal and cell toxicity. Mouse lethality tests were performed by injecting five 8-week-old BALB/c mice (Dominion Laboratories, Dublin, Va.) intraperitoneally with 200 ,ul of lysate and observing them for illness or death. Cytotoxicity was checked in the Chinese hamster ovary (CHO) cell assay by following a procedure previously described (7). Lysates of E. coli Chi 1776 transformed with pUC18 were used as negative controls. DNA sequencing and sequence analysis. Both strands of the DNA were sequenced by using the dideoxy-chain termination procedure developed by Sanger et al. (3, 31). DNA fragments were cloned into M13, and nested deletions were generated in replicating-form DNA by using the exonuclease III and exonuclease VII procedure (37). Restriction sites used for subcloning were sequenced across by using oligonucleotide primers and double-stranded sequencing. Synthetic oligonucleotide primers were also used for filling occasional sequence gaps not covered by the nested deletions. Sequence analysis was done by using the Pustell programs from International Biotechnologies, Inc., and the Sequence Analysis Software Package from the Genetics Computer Group, University of Wisconsin. The data bases that were searched included the GenBank data base and the National Biomedical Research Foundation Protein Sequence Data Base. Unweighted Pair Group cluster analysis was done by using the NTSYS-pc programs (F. J. Rohlf, Exeter Publishing, Ltd.). N-terminal sequencing. Toxin A was purified from culture filtrates of C. difficile VPI 10463 by sequential ammonium sulfate precipitation, ion-exchange chromatography, and precipitation at pH 5.6 as previously described (18). The highly purified protein was denatured with a final concentration of 2.5% sodium dodecyl sulfate-5% 2-mercaptoethanol at 100°C for 2 min and subjected to sodium dodecyl sulfatepolyacrylamide gel electrophoresis. After electrophoresis, the protein was transferred to polyvinylidene difluoride membranes by electroblotting and the N-terminal amino acid

INFECT. IMMUN.

DOVE ET AL.

482

ATMAAAMTC MATATTMATT TATTYMAM ATAATAAM SAMTATATM MATTTATTI

CAAAMAAT AAACMMGM ATCMATTMA ATTICAGAA MAATAMATST SSATATGAA

STOMlITTAT TATCAAAMAT AATMATACTA OSASSiTrT ATO TC TTA ATA TCT MAA SM GM ETrA ATA AMA CTC RCA TAT MOC ATT MGA CCA MGA GAA MAT SMG TAT AMA ACT ATA CTA L IS M S I CL I I N P KCT L I S L A K R NECRC KC C

-41 S

ACT MAT TTA GAC GAA TAT MAT AMG ETA ACT ACA AC MAT MAT GA MAT MAA TAT TIA CMA TTA MAA MAA CTA MAT GAA TCA ATT GAT SIT TIT ATS MAT MAA TAT MAA ACT TCA MOC MGA F I D V L S L CT S S R T Q KCKL N C MMCV T N MMCE CVK ITN KCL LDC T N

201

MAT AGA RCA CIC TCT MAT CTA MAA MAA SAT ATA TTA MAA GM STA ATT CIT ATT MAA MAT ICC MAT ACA MOC CCT GTA SM MAA MAT TTA CAT ITT GTA TSR ATA OCT SMA SM GTC MGT V S C C H F V 6 6 I KCMS N V KCML V I T SP I L I L KE V L CCK D N RA SN0 L

321

MAT ATT OCT CIT MAA TAC ATA MAA CMA TSR SIT MAT ATT MAT RCA GAA TAT MAT AlT MAA CTS TSR TAT GAT MGT SMA RCA TTC ITA GTA MAT ACA CTA MAA MRG OCT ATA GTT GM TCT I V C S A I I N T MT L C C I CLV I D SCE A F L V C I IK QVWA D MACE I A L D

441

TCT ACC ACT GAA RCA TFA CMG CTA CTA MRG GAA SMG ATT CMA MAT CIT CMA TTT MAT MAT AU MAA FTf TAC MAA MAA MRG A GAA TTT ATA TAT MAT MGA CMA MAA AMR TTT ATA MAT N F I I DR CR F I 0 F D IC CKR MCE C C C I0 N RCKF C A L N P L L Q S T T

551

TAT TAT MAA TCT CMA AUC MAT MAA CIT ACA GTA CCT ACA ATA MAT MAT ATT ATA AMG TCT CAT CTA GTA TCT GAA TAT MAT MGA MAT GAA ACT STA TTA GSA TCA TAT MGA ACA MAT TCT M S TV L C S 1SRT D I I K S M L V SCE M R DCE C I I I D MK P T V P T SQ9 I

51

TTG MGA MAA ATA MAT MGT MAT CAT OGM ATA MAT ATC MRGOCM"AT MCT 115 TIT ACA GAA CMA GAG TTA TTA MAT ATr TAT MiT CM GAG iTS TTA MAT CT SMA MAT TTA OCT RCA RCA R A MNS I F T I S SMN L A A A I L CE CEL L N I C M S MM S D Q C L L N R R I L

601

TCT SAC ATA GTA MGA TTA TTA 6CC CTA MAA MAT TIT RC SMA STA TAT TTA MAT SIT MAT ATG CIT CCA SAT ATT CAC UCT MAT TTA TIT MAA ACA ATA UCT MGA CCT AMC TCT ATT SM S I S R F C T I I M S D L P SS R 55 V DV L I L DRN PS6 I V L L A L CMN F S D

921

MAA CUC C L

1041

ATT ATA GM MCT MAA MGT GAA MAA UCT SMG ATA ITT UCT MAA TA GAA MAT TTA MAT STA UCT MAT CIT GAA ATT MAA ATA SIT TIC RCT TTA SWACMT SIT ATA MAT CMA 6CC ITS ATA

1161

ITA MAC CST TSR GAA ATU ATA MAA TTA SMG RCT ATT AUG AMC TAT MAA MAA TAT ATA MAT MAT TAT ACA TCA GSM AAC TV MAT MAA CIT MAT CMA CMA IA MAA MAT MAT TI F D CKL D K I CMN C KC I MM V CR I A IRCK ITTS KCL Q Q L KCD N F D R L I

CESCK

I

S

CS

E

C

I

F

SC

L

CMNL

L

SD

N V

I

C

C

I

A

A L

F

V

55

Q9

I

A L

I

UCA MAA CMA SAT UCA TAT CIT ACT AAC CTA STA ATA GSM CMA STA MAA MAT MGA TAT CMA ITT TTA MAC CMA CRC CIT AAC CCA GMC ATA SMG UCT MAT MAT AAC TTC ACA MAT ACT ACT M L C A ICE M L N P 5DM N F T D T T I F L NO V CMN RIQ Li OSI Q V SCK 0

1261

MAA AIT lIT CAT MAT UCA TTA ITT MAT USA RCT ACC RCA GAA MC TCT ATU ITT ITA RCA MAA ATA RCA CCA TAC TTA CMA STA SA TITT AUG CCA GM RCT CRC UCC RCA ATA MCT ITA

1401

C

I

M D

F

S

L

N SA

F

CRN

T A

N

F

L

C

T

1IA PI

L

0

V

S F

N P

S T

CAR

I

S

L

MCT SAT CCA SM SIT TAT OCRTUS OCT TAC TAT MAT TTC ATA MAT ITA CMA GM MAT ACT ATA GAA MAA ACT ITA MAA RCA USA MAT ITrA ATA GAA lIT MAA TUC CCA GAA MAT MAT ITA S

S

P

G

I

A

A

S

A

I

I

D

F

I

N

L

C

Q

N

T

I

CE

C

T

L

C

S

A

D

L

I

C

F

C

F

P

C

N

N

UCT CMA ITSG RCA GM CMA GAA ATA MAT MCT CTA USG MC ITT MAT CMA OCA MCT RCA MAA TAT CMA ITTf SM MAA TAT GTA MGA MAT TAT ACT SAT SM UCT CITr UCT SM GAC MAT ORM 5

0

L

T

C

0

C

I

N

S

L

V

S

F

D

0

A

S

A

C

IT

QfFE

C

I

V

N

D

I

T

S

S

S

S

L

C

D

N

1521

L

141

S

GTA GAC ITT MAT MAA MAT ACT 6CC CUC MAC MAA AAC TAT ITA ITA MAT MAT MAA AIT CCA USA AAC MAT STA GA GAA OCT SM MGT MAA MAT TAT SIT CAT TAT AUC ATA CMG ITA CMA 0 CM IVV N I I P 5MMN ICCE I I 0 L O F AS4S NRC T A L NCR V DCK N I L L

1761

SM MAT MAT ATA MCT TAT GM RCA RCA UC MAT ITVA TIT UCT MAA MAT CCT MAA MAT MGT AIT SIT ATA CMA CM MAT AUG MAT GAA MGT OCA MAA MOC TAC ITT ITA MGT MAT MAT SM SA Q C S I F L S D DRG C C M S I NM NRC O DD I C A F S NP I I I S L T CMN

161

GM UCT SIT ITSA GM ITSA MAT MAA TAT MCG ATA CCT GAA MGA ITSA MAA MAT AMG GM MAA STA MAA STA ACC ITT SIT SMA CAT SAT MAA MAT GAA TTC AAC RCA MOC SM TTT OCT MGA

2001

C

S

I

L

C

L

C

N

I

N

I

P

CER

L

C

N

C

C

C

V0,

C

V

T

F

I

S

N

S

C

D

F

C

N

T

S

C

F

S

N

ITSA MCT ITS MAT USA CIT UCC MAT SM ATA MGT USA ITTf ITSA MAT ACC ATA MAA ITSA MAT ATA USA CCT MAA MAT ITS GAA GTA MAC ITSA CIT SM UGT MAT AUG ITT MGT TAT MAT IT F F I D C N C C L V N L L SCM S NV D I SPPC I S L S N S V D L 5SSF L D T I

2121

MAT SIT GAA GAA ACT TAT CIT 00M AMG ITSG CTA ITSA MGT SIT AUG MAC MAA SIT ACT ICC ACT ITSA CCT MAT ITS MAT MAA MAT UCT SIT ACT ATA SM RCA MAT CMA TAT GM GTA MGA S AM 1 C M S I T I Q INR CL L L S IN DCK I T S T L PD V N C V N CE C T IVPSG

2241

SIT MAT MGT SMG SM MGA MAA GM CIT CUG OCT CRC USA SAT MAA USG ATA MAT MAA GAA G OCT SIT AUG ARCIWAT ITSA UCT MCT MAA GM TAC SIT ITT ITTf MAT UCT ATA MAT MAT C C C I M S CON I N L L CSE SOD L S S C C I I F F D S I DRN SMN SRG C V I NC

2301

AAG ITS MAA RCA AM UCC AM MAT SIT CCA SM ITSA RCA USA ATA USA GAA SAT ATA MAA RCA ITSA ITSA CIT MAT RCA MCT SIT MGT CIT MAT ACA MAA ITT SIT ITSA MAT MAT CIT AMG

2401

C

L

C

S

C

S

C

N

I

P

5

L

SAS

I

S

C

I

D

C

T

L

L

L

D

S

S

V

S

P

D

C

T

F

I

L

N

N

L

C

CITr MAT SIT GAA UCT UCT SIT ORGAMT TAC SITr TAT TAT GAA MAA ITSA SM CCT SIT MAA MAT ATA SIT CRC MAT UCT ATA MAT MAT ITSA ATA MAT SM TTC MAT CTA CITr GM MAT GTA C C M L C F N V 11CC I D S I D DL L L I I MN PVCK N L N IEC S S IRD 1 I

2601

UCT MAT GAA ITSA TAT GSM ITSA MAA MAA ITSA MAT MAT CTA MAT SMG AM TAT ITSA ATA UCT ITT GM MAT AUC USA MAA MAT MAT USA ACT TAC UCT GTA MGA ITTSITT MAC MAA MGT MAT C I SV N S N F I N I IL I S F CD S D SCK N N S T CCKL N N L DCE C CE L I C L

2721

SAT SMG USA SIT TAT GTA GAA RCA GAA MAA GAA SIT ITT USA MAA TAT MCC GAA CAT SIT RCA MAA GM ATA MCT ACT ATA AMG MAT MGT ATA SIT RCA MAT SIT MAT SAT MAT ITSA ITSG M L L I V C C C C C I T I S T I I T CKC I CMNS F S VS T I CMN DIVN £ SCESI

2641

MAT MAT ATA CMG ITSA MAT CAT ACT UCT CMA SIT MAT ACA ITSA MAC RCA RCA TUC lIT SIT CMA USA ITSA 55 MAT TAT MCT MC MAT MAA MAT GTA CU MAT MAT ITSA MGT ACC USA SIT O S I NTL L ID S S MT S Q MDL S T SIV N I OL D D NSAA F F I MCKDIV L

2901

AAG SIT CMA CIT TAT SIT CMA ITS ITT MGT RCA OCT ITSA MAT ACT ATA TAT MAC UCT AUC CMA ITSA ITS MAT ITSA ATA USA MAT SCA ITS MAT MAT ACT ATA MAT GTA ITS CCT RCA ATA

ma6

KCV

0

ISAO

L

L

F

RCA SMG60M ATA CCT SIT ITS UCT ACT T

CS

I

P

1 ST

I

S T

ATA I

S

LNMT

I

I D

S

I

L

VIN

L

ITSA MC SM ATA MAC ITSA SAT RCA RCA SITr AM SMA ITSA ITS MAC L

DR4

I

N

L

OSSA

I

C

CEL

L

I

S N

GAA CAT

DCE

SIVN

D

T

I N

MAC CCA ITSA ITS MAA MAA GM ITSA

MD

P

L

CCCEL

L

V

L

GAA C

OCT

P T

I

AMC RUG RT

3201

ISV

SC

SIT ITASOCA ATA MAT AU USA ITSA UT ATA OCT RCA ACT ITS OCT USA SIT SIT SM ATA SIT OCT SM SIT ACT SIT ITC ITSA ITSA CCT ATA OCT SAT ATA UCT RCA SM ATA OCT USA 55 1 S AC9 I 155 P S I IT I F L L P I S AS I L A I SAAT IS6 I MRNS L S I

3321

SIT I

3441

CIT SIT MAT MAT ITSA ITS ATA USA GM ATA MAT ITT MAT MAT MAT US ATA MAA CTA SM RCA UGT MAT ATA ITSA RCA AUG SM ORB SM USA SM CRC ACA RUG ACT SAT MAT ATA MAT SM S C S S N C S C 0 V I N P I I S F NM S I T T V D D L C N I L S S T N I D L

3501

ITSA SIT MAT MAT L N IV

GAA C

ITSA ATA ITS L

I

L

CAT MAT H D

AMC RCA C

S

ACT T

USA S

RU ITS I I

MAC N

TAT ITT I F

MAT N

CAT M

IMTSCT L

S

GAA C

UCT MAA MAA S

C

C

TAT MC CIT CIT I S P L

CRC IT TITC USA UCT CCA UCT ATA MCT UCT CAT SIT CCT USA ITSA USA SIT TAT UCT RCA ATA OGT 5T5 GM RCA GAA MAT ITS MAT N

F

CCT USA P S

F

SS

MGA

RU TIT F

RI

P S

I

S S

N

GAA

C

1SAF F SIT MGA MAC MAA I N0NK

PS

ACT SM OCA SIT CCA SAT C T RSAVP GS

USG TOR GM V

V

ITC TAT OCT TIT ITC MAT TAT RCA ATA ACT RCA F

I

0

15A

I

ITSA UCT TAT USA L

S

I

S

ITSA MAA

T

T

L

ITT

MAT

SM

F

0

5

C

OCA A

L

S

I SA

I

I

ITSA MGA USA ITSG GM MAT MAC L

N

S

L

C

M D

S

I

SM ACT

ST

SIT TAT GM MAC ACT MAT SIT MAA SIT P1 ICV D TM I CK I

CCA SM

SM

5

G

ACT TAC I T

C T

C

N

L

D

MAA RCA C

CL

DC

GAA C

MAT MAT MAA SIT ITSA 0

D

C

ITffTCUS MAA MAA ATA AU AUG ITSA F

SCC

MA" ITSA CIT MAT USA ATA MGA MAT ITSA N L LODS I R D L

MAA CTA MAT MAA MAT 0

T

MGA MAC ITC TNR N F

ACT

I

MN

TAC CCA SAT I PG

ATA I

AUG N

L

MAA TIT C

CCA

ACT P T

F

ATA I

UCT ITSA ITSA ITSA UCT USA TAT CCA ATA USA ACG MAT ATA MAT ITSA UCT MAA MAT MAT ITSA S

L

L

L

S

S

V

P

I

S

T N

I

N

L

S

KCD

FIG. 2. Nucleotide and deduced amino acid sequences of C. difficile toxin A gene.

0

L

I

L

MAT

OCT S

3061

TOG

MGA N

3601

ACT ACT T T

MAC

3921

CCT

PR TAC

IV

N

TSR ATA TIT V

I

F

4041

CHARACTERIZATION OF THE C. DIFFICILE TOXIN A GENE

VOL. 58, 1990

GMA MT BBT ACAT MAM MASM"AAMTA ATA MAA GAT BTT TTA MGT AAA ATY GAT ATA MAT MAA MAT MAA CrT ATF ATA BOC MAT B T K N EKE K L I I I KB4 K L I KS0 V LI K I S I C EB

4161

CMA MCA ATA BAT TfT iCA BM GAT ATA BAT MAT MAA GAT AGA TAT ATA TIC TlT MCT TBT GM TTA GAT GAT MAA All' MT TTA ATA ATA GMA ATA MAT CTT BIT BCA MAA TCT TAT MGT I * S lBS T I F SEn K* 1 V I F L T C K L D DK 1 I L I I C I N L V A K S TIS

4231

TMi ITA TMI TCT GMBAT MAA AAT TAT TIB ATA TCC MAT TTA TCT MAT ACT AIT GAG MAA Alt MAT ACT TTA ABC CTA BAT MT MAA MAT ATA MB TMC MAT TMC MCT BAT BAA TCT MAT SKISN K I N I N I T I K I N T T L L L L L 115D KU LB6 L DSI KUN I A T N V T

4401

MAT MAA TAT ITT MA BCT ATA TCT MAA MCA MCT CMA MAA MAC ATA ATA CAT TAT MAA MAA GMACMT MAA MRT ATA ITA MA TiT TAT MAT GAC MGT MCA TTA MAA TNT AMC MT MAA BAT E S K I I B 1 K Q * I A I TI K I I L K F I N K K F D ISKEN TED0 IT L C F KSD

4121

TfT ATl CT GMA BAT ATA MAT BTA TNT All MAA BAT BAT ATT MAT MCT ATA MCA GBA MAA TMC TAT BIT BAT MAT MAT MCT BAT MAA MGT ATA BAT ITC TCT ATT TCT TTA BIT MCT MAA E E I E I B T I K T TYS I I I I A V FEP TB4 IL V S K F KSD S TSD K DSF KSD

4441

MAT CMA BTA MAA BTA MAT SM TrA TAT TTA MAT MAA 1TC ITA TMC TCA TCT TMC CIT MAT TNT CTM MAA MAT 11A BAT BBA CMC CAT MAT MCT TCT MAT TNT AlT MAT TTA TTT ITT GAC S El S BD S F K E B E Q V V K N L F L V L T L E K IV TIS S V L NE0T I E FEE

4741

MAT ATA MT TTC 211 MAA TMC ITT BB TfT GMAAMT ATA MAT TNT BTA AlTC BAT MAA TMC TNT MC CIT BIT BIT MAA MCT MAT CIT MA TAT ITA MAA ITT ATT TIT GMC MAT MAT MAA SEE0K K B I E I V B K T V I E I V I F C F F SKV F T L V TENL B F K L E Fr

4BB1

MAT ATA BAT ATA TAT TNT MIT MAA TM MAA MCA TCI 11A TCT MAA MAC MCT ATA TNT MAC MAAMT BIT MGA MAT BIT ITA TA GAG CCT ATA TAT MAT CCT BAT MBC BIT MAA BAT ATA E El 1E I V B E I T I F ISD CVW K T IllS NV V V E P KIST I F S B PSD T ICSE

5001

TCT MCT TCA CTA BAT TNT TCC TAT MAA CAT ClT TAT MA ATA MAT MGA TAT ATA MAT MAA GTA TIG ATA BCA CAT BAT TrA TAT MCA MCT ITA ATA MAT AIT MAT AC MAT TAT TAT TCA E NE I ITS V P IL K L I A P L SLITT IL SF11 INE I NET S T TB6 I SERT

1121

MAT MAG TM TMC CCT MA AlT ATA BIT CIT AMC CCA MAT MCA TC CM MAA MAA ITA MAT ATA MAT ITA MAT MgT TCT tCT TIT GMA TAT MAA 211 TCT MCA GMA SMA MCT MAC TNT AIT N F IBI S IT NE L S S ISIF E 1KV W Elf I I PCt I I V L E PENT FEN K K V E

141

ITA BIT MGA TMC ITA MAA MAMT MAT MAA MAA ATA ITA CMA MAA ATA MGA AlTC MAA BIT AltC ITA TCT MAT MCT CMA tCA TTT MAT MAA AlT MCT ATA BAT TNT MAA BAT ATT MAA MAA K N LIE I F E KEN I K Q B TQ I E K NER NI DSF KSD K K I L L IEII ElT L V

1211

CTA TCA ITA SM TAT ATA AlTG MGT MAT TNT MAA lTCA TNT MAT TCT GMAMT MAA ITA BAT MGA MAT CAT ITA GOA TNT MAA ATA ATA BAT MAT MAA MCT TAT TMC TAT BAT MAA BAT MCT C I I IS SI E B SE IN K T B E F K K S F NL I ES E El L DES I N ES F S L L

1441

MCT AlT MAT BIT MAA MAA TAT TAT TNT BAT ATA MAT E I I I S T I K K F NE

1401

MCT SM BCA BCT ITA MCT MOT TAT MAA ATT AlT MAT BIT MAA CMC TNT TAT ITT MAT MAT BAT BIT ITI AlT CMG TlTSM ITA TNT MAA SM CCT BAT SM TfT MAA TAT TN ICA CCT

1721

MT ATn MAT MT GMA TA ABA GMA ATA TCT AT I * I SE K YE K I I

MAA ITA BIT MAA MA ITA AlTC MAT ATA MAT MAT lTCA ITA TFC TAT lIT BAT CCT ATA MAA ITT AMC ITA ITA MCT K

T

L B

V A

K

A

I

L

T

L

E

I

I

I

I

K

E

E

I

L

II N EG

F

K

I

N

F F

S

I

P F

NI E

F

E

E

I

S

L

V

V

M 211 CMA

TB

EQ9

L

VOW B

F

V

KBG

P

5

B

F

I

I

F

A

P

0CC MAT MCT CMA MAT MAT AMC ATA MAA BIT CMG OCT ATA ITT TAT CMA MCT MAA TTC ITA MCT TlT MAT MCAAMA MAA TAT TAT TNT BAT MAT AMC ICA MAA MA TCM MCT SM 211 MGA A

E

T

Q

E

E

I

I

N

Q

I

A

V

I

Q

S

K

F

L

T

L

E

B

K

K

I

I

F

S

E

E

S

K

A

V

T

541

E

IV

ITTG CMA GTA AlT GAC MAT MAT MAA TAT TAT ITC MAT CCT GAC MCT MCT AlT AlTC TCA MAA MCT 1 E Q I SEENK I IF I I V PSD T A KB9

5911

211 CMG MCT IT MAT SIT MCT AMA TMC TMC TNT MAT MCT MAT ACC OCT AIT GCC TNT MAT BIT TAT MAA MCT AIT BAT BIT MAA CMC TNT TAT TNT MAT MCT BAT T1T ITA 121 MAA ATA 1 H F F S T Q V I A FEB K I T F T T T ST A DISD C V V K I T V SI4K El4 SE

4ae1

TNT MAA TAT TNT MCA CCT OCT MAT MCT TAT MAT MAT AMC ATA MAA BIT CMG OCT ATA GIT TAT CMA MGT MAA TTC TTA MCT TTI MAT SIT MAA MAA E B K K Q A IQI I T L E EI C L T TI I V A P A E KF F F CE

4201

211 CMA MCT AIT BAT MOT MAA MAA TAT TMC TNT MAT MCT AMC MCT MCT MA MCA MCT MCT A TM CM MCT AlT BAT SIT MAA V Q E T E TA I A V Q I IF A I TI T K K T W SB6 K ID SI W

1321

TMG CMA MCT AIT BAT SIT MAA MAA TAT TMC ITTf MAT MCT AMC MCT MCT ATA MCT ICA MCT SIT TAT MCA AIT AIT MAT Ill I T I I E V Q T I DI6K K I T FEN T E TA I A W

4441

GTNT MfAA SM CCT MAT M ITT GMA TAT TNT MCA CCT MCT MAT AB BAT MCT AMC AMC ATA MA BIT CMA MCT BIT MAA CAT TNT TAT TNT MAT MCT BAT SIT AIT AlT C" ATA SM12 S E E I Q C I E I El A E SI I EI I V I H I I C A A F F P A T I F K P F T K F

1111

MAA TAT TMC TNT MAT CCT MAT

4411

AIT AIT AMC MAT GM MAA TAT TMATNT MAT CCT MAT MT OCT AIT MCT CA OlTC I

I

El

E

K

I

FEN

I

BIT 121 TNT MCT MCC TCT MT S El I V FIST S

PEE

TAT TMATNT BAT MAT AMC TA MAA MCA GIT AMC S I I SEENS F A V TB K

A

MAA TAT TMC TNT MAT MCT AMC MCT MCT MA MCA MCT MCT E TA C A A TB K I I F E T

N

A

A

V

I L

AlA CIT TMC CMA MAT MA TNC ITA MCT Ill MAT SIT MAA MAA TAT TMC TNT BIT MCT MAC TCA MAA MCA BIT MCT a"211AMA AIT AIT AMC MAT MAA V E I I E E K BIS K A V TB El K K I IF I L I Q E I F L T L

K

I

I

FE0

P

E

SM ATT CIT CMA MAT GSA TAT AIT MCT ATT MAA MGA MAT MAT TlT TAT TNT BAT MCT A S I I T I IE EEN F T F El I L Q

maD

MAT MAT MAA TCT MAA AlT GTA MCA SM ITA TNT MAA SGA CCT MAT a" ITT MAG TAT TNT MCA CCT MCT MAT MCT CMC MAT MAT AMC ATA MAA BIT CMG MCT ATA lIT TMC CMG AMC MAA I I Q I HE E E I A I V E E P F ElT El9 F C I F A P A KII 9 E K C S K E V TI4 V

421

TM CMA ACC AIT BAT SIT MAA MAA TAT TMC TNT MAT CIT AMC MCT MCT MAA MCA MT MCT I A T E L E TA V Q A T I W D5GK K I I F

7141

SMA TM CMA MCT AIT MAT SIT MAA MAA TAT TMC TNT MAT CIT AMC MCT MCT MA MCA MCT MCT SMA TM CMA MCT AIT BAT SIT MAA MAA TAT TMC ITT MAT MCT AMC MCT ITC ATA 5CC A I I FE T ElT F V Q I I E L E TA A TI T IVW Q CA W DSIK K I T DI4K K I I F

7111

TCA MCT SIT TAT MCA MGT AIT MAT SIT MAA CAT TIT TAT TNT MAT MCT BAT SIT AIT AlT CMG AlA SMA 121 TNT MAA GOA CCT MAT GOA TNT MAA TMC TNT MCA CCT MCT MAT AMB BAT S F I I A El I El F P A HF I F K P SIV I F ElT DIG I ITS El6 K SIB6 EQ9 I

7241

AMC MBA CT1 CM MT ATT MAT BIT I I SI6 T I6LE

7401

TM CMA MCT AIT MAT SIT MAA MAA TMC TM ITT MAT MCT AMC MCT TCT ATA MCT TCA MCT BIT TAT MCA AIT ATT I I I A I I I I K SIB V Q El I I F EYE TIS K W

7121

MCT MT MAA CAT TNT TAT TNT MT MCT BAT BAT AIT AlT CMG ATA SMA 121 TNT MAA SMA CCT MAT SMA TNT MAA TMC TNT MCA CAT MCT MAT MCA MAT MCT MAA MAT ATA MAA BIT CMA A EI T F F A P A E S E GI Q F C I I I B V II6 P SI I EQ HF Dl6 FEN T II6 K

711

MCT AlA CIT TAT CMA MAT MGA ITC CIA TAT ITrA CAT MAAMT ATA TAT TAT TNT SIT MAT MAT TCA MAA MB MCT MCT SIT TMG ITA MCT AIT BAT BIT MAT MGA TAT TMC ITC MAG CCT 1 1 Q E E F L IV L H S E I I I F B E E I K A A TB V V I I S B E E I I F C P I A

7711

TCT MAT SMA lIT MAA TMC TNT MCA CCT

70I

MAT MCT AIT MCT MCA ATT CAT CTA T1C MCT ATA MAT MAT GAC MAA TAT TMC TNT MCT TAT BAT E

A

I

A

TTC ITA MCT Il F

L

T

L

IN

A

N

C T

L

I

ENSN

K

I

MAT SM MAA MAA TAT TAT TNT BAT MAT GMA TCA MAA MCA lIT MCT El

K

K

I

I

F

D

ESD

1K

A

V

MCT AMC AMC AlA MABIT CMA MT AlA CIT TMC CMA MAT MAA ITC ITA MCT IT A

E

MAA MAA K

K

E

TAT I

I

EI

Q

A

I

L

I

Q

E

K

F

TMC TNT MAT MCT AMC MCT MCT BIT MCA lIT MIT SMA I

I

IFS

FE

T

E TA

V

A

VII

LIT

L

TI

Dl6

MAT SIT MAA MAA TAT TMC TNT SIT MT GAC tCA MAA MCA lI El

1 K

1

1

B S D S

F

K

A

MAT MCA MT Altg SIT GMB MAT BIT TAT MAA MCT AIT BAT MAT MAA MAT TNT TMC TNT MGA MAT BIT ITA CAT CMS ATA MA GTG TNT MAA B E T A E I A E I I K I I S E K E F I F E E G L P Q I I V F K I

V

I

E

I

F

C

I

F

A

P

MCT MAT MIs BAT MCT AMC MAT ATA MAA SIT CM MCT ATA COT TAT CMA MAT AMA ITC CIA CAT ITA CIT SMA MAA ATA TAT TMC TNT SIT MT MAT TCA MAA MCA lIT MCT SM TMG CMA W Q A V E ElN K VII I IF I I I Q E L I E E Cl Q EIS0A I LB6 K I Eln F L A A

051

MCT AIT MAT BIT MAA GTA TAT TMC TlT AlT CCT MAT MCT MCT AlT MCT MA MCT BIT SMA CIT ITTC GM AIT MAT SIT lIT AlA TAT TTC TNT SIT ITT BAT SM ITA MAA 6CC CCT GMB

B121

I

I

E

B

K

V

I

I

F

E

P

5

T

A

E

A

A

A

I

B

L

FE I I

SI

V

I

I

F

F

B

V

S

I

ATA TAT SM TMA MATATATGIT TBATAAAAM ITATTCCT1T MCTMCTAMA MITTATITM ATATMAIMA TAITTGGAT TMITMAMT CATITGTTAT 2TMTMAAAT GACTTTAGM ITAMA

FIG. 2--Continued

V

K

A

at"

P

1

483

484

INFECT. IMMUN.

DOVE ET AL. Ii

'567145760

Al

I4

6817-6906

A2

I3

6475-6564

A4

I5

7219-7308

A3

I6

7558-7647

As

I7

7831-7920

A6

I2

6073-6162

A7 B,

90

80

100

B2

Percent DNA Sequence Similarity

B6

FIG. 3. Nucleotide sequence similarity cluster analysis of the class I repeating sequences.

83 B50

sequence was

determined by previously described methods

Bs

(23). RESULTS Primary clones of toxin A. Relationships between the five primary clones, each containing a portion of the C. difficile toxin A gene, are shown in Fig. 1. Also included in the figure is a partial restriction map of this 15-kb region of the C. difficile genome. Clone pCD11 has been partially characterized and shown to contain a carbohydrate binding region and antigenic epitopes which react with the monoclonal antibody PCG-4 (28). Clone pCD11R-6, in addition to containing the entire pCD11 insert and most of the pCD11L insert, contains the last 80 bases of the toxin A gene and approximately 4.1 kb of additional sequences downstream from the toxin A gene. The downstream region contains two open reading frames (ORFs) and part of the third, one of which is shown in Fig. 1. All of these ORFs read in the direction opposite that of the toxin A gene (data not shown). Clone pCD11L contains an additional 1.5 kb of sequence upstream of the pCD11 insert. Clone pCD17 was used as a probe for cloning pCD19. Clone pCD19 codes for the 5' end of toxin A, a small ORF that could code for a 16- or 19-kilodalton (kDa) protein and 1.2 kb of toxin B. These clones were not toxic for mice or CHO cells. The clone immediately upstream from the pCD19 insert was found to contain the remainder of the toxin B gene, and we have since been able to reconstruct the intact gene in a plasmid. The recombinant protein expressed by this plasmid is cytotoxic to tissue cells, is lethal to mice, and has immunological identity with toxin B (D. M. Lyerly and J. L. Johnson, unpublished data). Nucleotide and amino acid sequences for toxin A. The nucleotide sequence and the deduced amino acids for the toxin A gene are shown in Fig. 2 (GenBank accession number, M30307). The open reading frame is 8,133 nucleotides long and codes for 2,710 amino acids. The gene contains 26.9 mol% G+C, and the deduced protein has a molecular mass of 308,103 Da. The amino acid sequence of the N-terminal end of toxin A was determined by microanalysis after electrophoresis under denaturing conditions (23), and the first 10 amino acids agree with the first 10 deduced amino acids of the toxin A open reading frame, indicating that there are no posttranslational modifications involving a signal peptide. An interesting property of this gene is the repeating sequences at the 3' end. A total of 2,551 nucleotides, or 31.5% of the gene, are in 38 contiguous repeating units. This region extends from nucleotides 5,545 to 8,106. The repeating units were grouped into two classes, I and II, on the basis of the low levels of DNA sequence similarities between

B9 B7 B,,

B13

B12 D2 C1

C3 C5

C4 C2

Ds D4

50

60

70

80

90

100

Percent DNA Sequence Similarity

FIG. 4. Nucleotide sequence similarity cluster analysis of the class II repeating sequences.

them. There are 7 class I and 31 class II repeating units. Each of the class I repeats is 90 nucleotides long, and the class II repeats are either 60 or 63 nucleotides long, with the one exception being 66 nucleotides long. The class II repeats have been subdivided into 7 class IIA, 13 class IIB, 5 class IIC, and 6 class IID repeats. Nucleotide sequence similarities among the class I repeats are shown in Fig. 3. Similarities ranged from 73 to 98%, with the average values in the cluster analysis being 80% or greater. Nucleotide sequence similarities among the class II repeats are shown in Fig. 4. With the exception of class IID, clustering within each subclass is high, being 70% or higher for class IIA, 65% or higher for class IIB, and 76% or higher for class IIC repeats. The class IID repeats are a diverse collection, in that all are very distinct. Two of them fit closer to the class IIB cluster and one fits closer to the class IIC group than to the others in class IID. This is also the only group in w/hich there is any size variation; repeat unit class IID4 has an extra AAA codon, while IIDs has one fewer codon. The deduced amino acid residues for the repeated se-

CHARACTERIZATION OF THE C. DIFFICILE TOXIN A GENE

VOL. 58, 1990

485

Class I peptides

LIG

FIK G P D

I1 I2

1891-1920

M Q

2025-2054

V K

I3

2159-2188

IG V FIS T S NI G F E Y F A P A N T M Q IIG V FIK G P NI G F E Y F A P A N T

I4

2273-2302

M V TIG V FIK G P NI G F E Y F A P A N T H N N N I E G Q A I

V

G F E Y F A P A N T Q N N N I E G Q A I Y N N N I E G Q A I

D A N N I E G Q A I

I5

2407-2436

M Q IG

I6

2520-2549

N Q IG V

I7

2611-2640

F|K G P D G F E Y F A P A N T D A N N I E G Q A I P Q I[ VF|K G S NIG F E Y F A P A N T D A N N I E G Q A I

CONSENSUS

NQ I G V F K G P N G F E Y F A P A N T D A N N I E G Q A I

FIK

V

G P NI G F E Y F A P A N T D A N N I E G Q A I

FIG. 5. Deduced amino acid sequences for the class I repeating units. Unit designations (I1 to 17) are listed in order from the N-terminal to C-terminal direction. The inclusive amino acid residue numbers are given for each unit, and the conserved amino acids are boxed. CLASS IIA

PEPTIDES

Al

1921-1940

V Y Q S K F L T

N G K

KFY

Y F D N N S

A2

2055-2074

V Y Q S K F L T L N G K

KIY

Y

A3

2189-2208

L Y Q N E F L T L N G K KIY Y Fl G S D S

A4

2303-2322

V Y Q N K F L T L N G K KIY Y Fl D N D S

As

2437-2456

L Y Q N K F L T

N G K KIY Y F G S D S

A6

2550-2569

R Y Q N R FL

H D N I|

A7

2641-2660

R Y Q N R FL H

L G K I

CONSENSUS

V Y Q N K F L T L N G K K Y Y F G N N S

CLASS IIB

Fl D

N N S

F G N N S

Y

G N N S

PEPTIDES

B,

1849-1869

N L V T G W Q T

N G K KYYF1D IN T G

B2

1941-1961

K A V T G W R II

N N E KIY Y

B3

2075-2095

K A V T G

B4

2096-2116

E A A T G W Q T I D G K KI| Y

Bs

2117-2137

E A A T G W Q T

B6

2209-2229

K A V T G W R I

B7

2323-2343

K A VIT GtW Q

Bs

2344-2364

E A A T G W Q T

B9

2365-2385

E A A T G W Q T

w

FIN

P N N A

Q TIIID S K KIY Y FIN T N T A

FIN T N T A

N N K K|Y Y

TuIID

FIN

D G K KIY Y FIN I

P N N A

G K KIY Y FIN L N T A L

N

T

A

D G K KIY Y FIN T N T F

B,o

2457-2477

K A V T G L R T

I

D G K KI|

B,,

2478-2498

V A V T G W Q T

I

N G K KIY Y FIN T N T S

B,2 B,3

2570-2590

K A A T G W V T

2661-2681

K A V

CONSENSUS

K A V T G W Q T I D G K K Y Y F N T N T A

T G

W Q T

Y

FIN T N T A

D G N RIY Y FIE P N T A N G K V|

YF|M P D T A

CLASS IIC

C,

1870-1890

A A L T S YK

C2

2004-2024

I A F N G Y K T I D G

C3

2138-2158

I A S T G Y T I I N G K H F Y F N T D G I

C4

2386-2406

I A S T G Y T S I N G K H F Y F N T D G I

C5

2499-2519

I A S T G Y T I I S G K H F Y F N T D G I

CONSENSUS

I A S T G Y T I I N G K H F Y F N T D G I

I I N G K H F Y F N N D G V K

H F Y F D S D C V

CLASS IID

D,

1962-1982

I A A V G L Q V I D N N K Y Y F N P D T A

D2

1983-2003

I I S K G W Q T V N G S R Y Y F D T D T A

D3

2230-2250

I A A I H L C T I NN D K Y Y F S Y D G I

D,

2251-2271

D5

2591-2611

M G A N G Y K T I D N K N F Y F R N G L

D6

2682-2702

M A A A G G L F E I D G V I Y F F G V D G

CONSENSUS

I A A

quences are shown in Fig. 5 and 6. The inclusive amino acid residue numbers are given for each repeat unit, and the conserved amino acid residues within each class or class subgroup are boxed. Seventy percent of the amino acids in the class I peptides are conserved among the units, while less than 50% are conserved within each of the class II subgroups. The dipeptide tyrosine-phenylalanine (YF) is the most conserved and can be found in all 38 repeat units. It represents residues 14 and 15 in the class I units and residues 15 and 16 in the class II repeats, except for unit IID4. Base differences in the regions of conserved amino acids involved the codon's third base as expected, whereas switching from one amino acid to another in a given position usually involved a total codon change or at least two of the bases. A hydropathic index plot for the deduced toxin A protein and a map of the repeat units are shown in Fig. 7. There is no evidence for a signal peptide at the amino-terminal end; this finding is in agreement with the lack of posttranslational modification of the N-terminal end of the protein. The only strongly hydrophobic region in the deduced protein is from residues 1,050 to 1,100. There appears to be a periodicity in the hydropathic index within each repeat region. However, the repeat region is for the most part hydrophilic. The 160 bases immediately upstream from the toxin A translation initiation site are shown in Fig. 2. There appears to be a ribosomal binding site (GGAGGT) starting six bases upstream of the initiation codon. Since we do not know where transcription initiates, it is difficult to predict promoter regions, although there are several TA-rich areas in the region of 160 bases upstream (Fig. 2). Other than these, there do not appear to be any other unique structures, such as inverted or tandem repeats. Small protein. A small ORF (ca. 500 base pairs; Fig. 8; GenBank accession number, 30308) is located 122 bases downstream from the stop codon of the toxin B gene. Although the deduced amino acid sequence begins with the first start codon, there are two additional ATG codons at the amino acid residue positions 25 and 27. There appear to be ribosomal binding sites in the -10 regions of the first (GGTGGA) and third (GGAGGC) ATG codons. The deduced protein would have a molecular mass of 18,798 Da by using the longer sequence and a 15,878-Da molecular mass by using the shorter sequence. The pl values for the two peptides are 9.22 and 9.11, respectively. The hydropathic

L Q N G Y I T I E R N N F Y F D A N N E S K

-

G

-

-

T I

-

N

-

-

Y Y F

-

-

D

-

-

FIG. 6. Deduced amino acid sequences for the class II repeating units. Unit designations are made in the same manner as for the class I units.

486

INFECT. IMMUN.

DOVE ET AL.

RESIDUES x

w z

FI 0 0

0

b

I

Ak

I 1.

1.

1250

1000

A

500

250

750

1.1 (! 1!1:1 fi .

I

i

1500

I,82D,11 ICiA2B3B48 | C3 13 ;l||.1 L AB6

11

.

1750

Molecular characterization of the Clostridium difficile toxin A gene.

The gene encoding the toxin A protein of Clostridium difficile (strain VPI 10463) was cloned and sequenced. The coding region of 8,133 base pairs had ...
1MB Sizes 0 Downloads 0 Views