Molecular and Biochemical Parasttology, 45 (1991) 179-182

179

Elsevier MOLBIO 01484

Short C o m m u n i c a t i o n

Clonal variation in the Plasmodium falciparum circumsporozoite protein gene Michael J. L o c k y e r Department of Molecular Biology, Wellcome Biotech. Beckenham. Kent. U.K. Received II October 1990; accepted 29 October 1990)

Key words: Circumsporozoite protein gene: Plasmodium falciparum: Malaria

Structural variation in the circumsporozoite (CSP) gene of Plasmodiumfalciparum, a malaria vaccine candidate antigen, has been shown from studies on laboratory isolates [1-6]. The variation occurs as length heterogeneity in the amino terminal region, in the number of tandem NANP repeats and the interspersion pattern of NVDP repeats. A number of amino acid substitutions, predominantly in the C terminal region, map to Tcell epitope domains [4-7]. Individuals are often infected with a mixed population of P.falciparum parasites [8] and T-cell epitope sequences corresponding to five different CSP genes have been found in a single individual [6]. The complete CSP gene sequences of two clones derived from a single isolate are presented here. The T9 clones T9-98 and T9-101 were obtained from D. Walliker, University of Edinburgh, Scotland. In vitro growth and genomic DNA preparation were as previously described [9]. DNA from both strains was digested with mung bean nuclease [1] and cloned into the EcoRI site of Agtll (Stratagene). Recombinant plaques were screened with a repeat-specific oligonucleotide probe [2] and the DNA from plaque-purified positive clones Correspondence address: Michael J. Lockyer, Dept. of Molecular Biology. Wellcome Biotech. Beckenham. Kent. U.K.

Note: Nucleotide sequence data reported in this paper have been submitted to the GenBank TM data base with the accession numbers M57498 and M57499.

Abbreviations: CSP. circumsporozoite protein: bp. base pairs.

subcloned into the EcoRI site of pUC9 using standard techniques [10]. The DNA sequence was determined using plasmid-primed dideoxy sequencing (Sequenase, US Biochemical) with synthetic oligonucleotides corresponding to the Wellcome CSP gene sequence [2] spaced at approximately 150 bp intervals. The sequence was determined in both directions except for the repeat region. The DNA and deduced amino acid sequences of the T9-98 (393 amino acids) and T9-101 (424 amino acids) genes are shown in Fig. 1. The 5' coding region of T9-98 like that of NF54 shows a 57-bp deletion [5,11] compared to the strains T9-101, 7G8 and LE5 [4]. The CSP gene of the Wellcome [2] and B I 1 strains [4] contains a 30-bp insertion at this position. N I, one of three regions which are a focus for non-synonymous point mutations [3] varies between T9-98 and T9-101 at a single nucleotide [GGG~GCG, Gly---,Ala. The sequences are identical two codons downstream where in B I 1 a G G T ~ G A T mutation results in a G l y ~ A s p change. The two genes also differ in Th2R and Th3R, the two polymorphic T-cell epitope domains [3,6] in each case by a single nucleotide resulting in an amino acid change. The central repeat region of the T9-101 CSP gene, like that of NF54, encodes 44 repeats. By contrast the T9-98 gene has the same number [41] and organization of repeats as that of 7G8 [1]. The repeat region of the former differs in not displaying the silent substitutions of 7G8 in units 15, 16 and 19, but is otherwise identical with 37

0166-6851/91/$03.50 © 1991 Elsevier Science Publishers B.V. (Biomedical Division)

I,~{)

ATG A~6 AGA AAA [ZA G C T A r ' i rrA ~ G ~ 7c~ l~c TT~ ~ A rI~ G.I~ GAG GCC Me% Me~ Arg Lys Leu AIs Tie Leu Ser t'a~ Ser Set ~ne Leu Phe ,.,a! Glu Ala

77A ~ CAG GAA TAC CAG "PC(: tAT GGA AGT [CG ~CA AAC ACA AGG G~'~ C~A AA~ Leu P n e Gin G~a i"ir Gln Cys ['/r G~y ~er Set 5er ASh "Vnr Arg Vat Leu A-~n

CUbA ITA AAT T A T GA~ A A T GCA GGC ACT AA~ I~TA T&i AAT GAA ~TA GAA A ~ G AAT GIU Leu A~n i~[r Asp ASh AIs Gly "Thr ASh Leu ~yr Ash Glu Leu Giu Mo~ Asn

TA~ TAT GG~ AAA CAG GAA AA~ ~GG ~AT A G T CTT AAA AAA AAT AGr ACA ~CA Cq~ ryr Tyr GI'~ Lys Gin GIu Asn Trp Tyr Set Leu Lys Lys ASh Set Arq Set Leu

F

GGA GAA AAT CAP C A T GOA AA~ AAT AAT AAT GGA GAq A A r OG~ CGT GAA C ~ T AAA G11 GIu ASh Asp A s P L G I y ASh ASh A~n Ann Gly A s p ASh GLy Arg GIo GI'/ tys

~

CAI GAA GAT

GA~ GGA AA'£ AAC GAA GAC AAC GAG AAA TTA AGG AAA CCA

Asp Glu Asp Lys Arg A s p J G I y ASh Asn Glu Asp Asn GIu Lys Leu Arg Lys Pro Ala REGION I -CAAA C A T AAA A A A "~ZA AAG CAA CCA GGG CAT GGT AAT CCT GAT CCA AAT GCA AAC LyB His Lys T_ys Leo Lyn Gin Pro Gly Asp GZy ASh Pro Asp Pro A~n Ala Asr~ i --A --A CCA AAT GTA GAT CCC A A T GCC AAC CCA AAT G~A GA~ CCC AAZ GCC AAC CCA AAT Pr.~o Asn Val A s p Pro ASh Ala Ash Pro Asn Val Asp Pro ASh Ala _Asn Pro Asn 2

~

ASh

Ala

5

~

-C- A-C GTA CAT CCA A A P C,CA AAC CCA AAT GTA CAT CCA AA~ GCA AAC CCA AAT GCA AAC AS© P r o A s n A],~ A~n P~'Q ASh V a l

Val

6

ASO Pro ASh A l a

8

Asn Pro Ash Ala

ASh

IO

9

CCA A A T GCA AAC CCA AA~ GCA AAC CCA AAT GCA AAC C C A AAT CCA AAC CCA AA~ P r o Asn A/a ASh Pro ASh Ala A~n Pro A~n AI8 ASh Pro Asn Ala~ ASh Pro A~n 1~.

Z2

~4

15 --T

--T

GCA AAC CCA AA~ GCA AAC CCA A A T GCA aaC CCA AAT G C A AAC CCA AAT ~CA AAC AIa ASh Pro A S h Ala ASh Pro ASh A/a A S h Pro Ash AIa Ash Pro A s h Ala. _Ash 15 t6 l~ 18 19

Asn Ala

ASh Pro Asn Ala

ASh P r o A~n A l a

ASh P r o

A s n A~8 ASh P r o As.._.~n

20

~.t 22 23 VaL A~p --A --C -T- G . . . . T AAC CCC AA~ GCA AAT CCT A A T GCA AAC CCA AAT GCA AAC CCA AAT G C A AAC AIa ASh Pro A~n Ala ASh Pro A~n Ala ASh Pro Asn Aim ASh Pro Ann A.).a Asn 24 25 26 ~" ~8

CCA AAC GCA AAC CCC AAT GCA AAT CC~ A A T GCA AAC CCC AAT GCA A A T CC~ A A T Pro ASh Ala ASh Pro Asn Ala A s n Pro A~n Ala A s h Pro ASh Aie__Asn Pro --ASh

GCA AAr CCT AA~ C,C C A A T CCA AAT CCA A A T CCA AAZ GCA AAC CCA AAC GCA AAC Ala A S h Pro AIn Ala ASh Pro A s n Ala ASh Pro A S h Ala ASh Fro ASh AIa ASh

CCC AAZ GCA AAT C C T AAT GCC AAT CCA A A T GCA AA~ CCA AAI GCA AAC CCA A A T Ft....?ASh Ala A S h Pro ASh Ala ASh Pro ASh Aln A S h ITO A S h AL_a ASh Pro A&._.~n ~B

39

¢0

41

CCA AAC CCA AA'I GCA AAC CCC AAT GCA AAT CCT AAT AAA AAC AAT CAA GGT AAT

A18 ASh Pro Asn AI~ _Asn Pro ASh Ala Asn Pro Ash Lys ASh AS, Gln G17 Asn 42

43

44

GGA CAA GGT CAC AAT A T G CCA AAT GA¢ CCA AAC CGA A A T G~A GAT GA~ A A ? G C ? Gly Gin GI 7 His AAn Me~ Pro ~ n Asp Pro ASh Al"q ASh V n l Asp GIU ASh Ala AAT CCC AAC AA'I G C T G T A AAA AAT AAT A A T AAC C,AA C,AA CCA AGT CAT AAG CAC Asn AIm Asn Ash AI8 Vsi LyB Ash ASh Asn Asn Olu Olu Pro ~er A s p Lys His Thr -CATA G A A CAA T A T TTA AAO AAA ATA CAA #dLT ~CI CT~ TCA ACT GAA TGG TCC CCA lle GIu Gin Tyr Leo LyB Lys 11e Gln ASh Set Leu Ser Tnr GIu Trp Ser Pro ThOR

Fig I. The DNA and deduced amino acid ,,equence of the CSP gene of the P lah tl,at um clorle,, Tt)-t~N and Tq- I Ill. The roam ,,equence ,,hov.n corre-.pond,, to Tg-I(ll. The ,.olr, er, ed Region,, l-llI are o~erllned, the repeat,, are numbered, the malor repeat,, are underlined and the m i n , r repeat,, double underlined. ]~he I cell epltope domain,, Th2R and Th3R are delineated b) hatched underhnlng. The 5" bp deleted lU TV-t;~ i,~ enclosed b> bracket,,. Nucleotlde and ammo actd ,,ub,,tttutions in Tq-9,q are ,,hov, n abo'.e the sequence. In the repeat region, the repeat,, from both clone,, ",,,ere aligned from the 3' end to maximize homolog) qunlt, 23-..1.-1 m Tg-101 '.,.ith unit,, 20--1-1 m Tq-q8~. It i,, a

Clonal variation in the Plasmodium falciparum circumsporozoite protein gene.

Molecular and Biochemical Parasttology, 45 (1991) 179-182 179 Elsevier MOLBIO 01484 Short C o m m u n i c a t i o n Clonal variation in the Plasmo...
188KB Sizes 0 Downloads 0 Views