IAI Accepts, published online ahead of print on 6 October 2014 Infect. Immun. doi:10.1128/IAI.02537-14 Copyright © 2014, American Society for Microbiology. All Rights Reserved.
1
Anaplasma marginale superinfection attributable to pathogen strains with distinct
2
genomic backgrounds
3 4 5
Eduardo Vallejo Esquerra1, David R. Herndon2, Francisco Alpirez Mendoza3,§ Juan Mosqueda4, and Guy H. Palmer1*
6 7
1
8
Washington 99164 USA;
9
Service, United States Department of Agriculture, Pullman, Washington 99164 USA;
10 11
Paul G. Allen School for Global Animal Health, Washington State University, Pullman, 2
Animal Diseases Research Unit, Agricultural Research
3
Programa de Salud Animal, INIFAP-CIRGOC, La Posta, Vera Cruz, México;
4
Universidad Autónoma de Querétaro, Campus Juriquilla, México.
12 13
*Corresponding Author
14
Paul G. Allen School for Global Animal Health, Washington State University, Pullman,
15
Washington 99164-7090
16
Tel: (509) 335-6033
17
Fax: (509) 335-6328
18
E-mail:
[email protected] 19
§
Dr. Alpirez is deceased.
20 21 1
22
Abstract
23
Strain superinfection occurs when a second pathogen strain infects a host already
24
infected with a primary strain. Understanding of the selective pressures that drive strain
25
divergence, which underlies superinfection, and allow penetration of a new strain into a
26
host population are critical knowledge gaps relevant to shifts in infectious disease
27
epidemiology. In endemic regions with high prevalence of infection, broad population
28
immunity develops against Anaplasma marginale, a highly antigenically variant
29
rickettsial pathogen, and creates strong selective pressure for emergence of and
30
superinfection with strains that differ in their Msp2 variant repertoire. The strains may
31
emerge either by msp2 locus duplication and allelic divergence on an existing genomic
32
background or by introduction of a strain with a different msp2 allelic repertoire on a
33
distinct genomic background. To answer this question, we developed a multi-locus
34
typing assay, based on high throughput sequencing of non-msp2 target loci, to
35
distinguish among strains with different genomic backgrounds. The technical error level
36
was statistically defined based on the percentage of perfect sequence matches of
37
clones of each target locus and validated using experimental single strains and strain
38
pairs. Testing of A. marginale positive samples from endemic tropical regions identified
39
individual infections that contained unique alleles for all five targeted loci. The data
40
revealed a highly significant difference in the number of strains per animal in the tropical
41
regions as compared to infections in temperate regions and strongly supported the
42
hypothesis that transmission of genomically distinct A. marginale strains predominates
43
in high prevalence endemic areas.
44 45 2
46
Introduction
47
Microbial strain structure is dynamic over space and time. For pathogens, shifts in
48
structure characterized by emergence of genotypically unique strains and loss of others
49
results in changing patterns of disease, whether within an individual or a population.
50
However the scale of space and time differs markedly among pathogens depending on
51
multiple factors including pathogen-specific mechanisms of genetic change and
52
effective pathogen population size. For example, RNA viruses such as lentiviruses are
53
capable of rapidly generating mutations throughout the genome due to the large number
54
of progeny emanating from a single virus within an infected cell and the minimal
55
proofreading during replication (1-3).
56
have both limited number of progeny per replicative cycle and stricter control on
57
genomic fidelity between generations (4,5). These differences in the time required to
58
generate genetic variants aside, the successful emergence of a new genotypically
59
unique strain is dependent on the strength of the selective pressure acting on the
60
pathogen population (6).
In contrast, bacteria and eukaryotic parasites
61
Immunity is a strong selective pressure for newly emergent genetic variants.
62
Within an individual host the immune response selects for new variants; for viral
63
pathogens this includes true genomic variants whereas higher order microbial
64
pathogens most typically use differential expression within a more stable genome (3-
65
5,7,8).
66
pressure and has been postulated to be a driver for diversification in strain structure
67
across the spectrum of microbial pathogens (9,10). In this model, the development of
68
immunity within a temporally and spatially defined host population against an existing
However at the host population level, immunity is also a strong selective
3
69
strain provides strong selection for an antigenically distinct strain that infects the host
70
population regardless of the pre-existing immunity (11).
71
We have been investigating how infection prevalence and associated population
72
immunity drives strain structure using Anaplasma marginale, an α-proteobacterium that
73
infects wild and domestic ruminants (12). A. marginale establishes persistent infection
74
within an animal by antigenic variation in the immunodominant surface protein, Major
75
Surface Protein (MSP)-2, which is generated via gene conversion using a genomic
76
repertoire of distinct msp2 alleles (13-16).
77
revealed that although A. marginale has a “closed core” genome with essentially the
78
same gene content among all strains, the repertoire of msp2 alleles is distinct (17,18).
79
Importantly, strains with a distinct msp2 allelic repertoire have been shown
80
experimentally to be capable of strain superinfection, in which a second strain
81
establishes infection in an animal already infected with a primary strain (15,19,20).
82
Superinfection requires that the second strain contain at least one unique msp2 allele
83
and that this be expressed at superinfection (19,21).
84
experimental observations, we have more recently identified a significantly higher
85
number and diversity in expressed msp2 alleles in A. marginale infections from high
86
prevalence, high population immunity tropical regions as compared to low prevalence,
87
low population immunity temperate regions (22).
88
Genomic comparison of multiple strains
Consistent with these
The increase in msp2 allelic diversity in the high prevalence tropical regions may
89
arise from two different scenarios (Fig. 1).
90
establishes broad population immunity and thus selects for a distinct strain B that has a
91
unique msp2 allelic repertoire. Alternatively, strain A may expand its allelic repertoire
4
In the first, the theoretical strain A
92
on the same genomic background to generate a closely related strain A’. This latter
93
possibility is supported by msp2 allelic gene duplication within a single strain and the
94
presence of differing numbers of msp2 loci and alleles among strains (16-19). Either
95
strain A’ or B would generate the observed increase in the number and diversity of
96
expressed msp2 alleles within the high prevalence tropical regions (22). In the present
97
study, we address this question by developing and validating a multi-locus strain typing
98
approach and testing whether strain superinfection in tropical regions is linked to strains
99
with the same or different genomic backgrounds.
100
Materials and Methods
101
Identification of loci with strain-specific alleles: Target loci that had previously been
102
shown to be broadly conserved among strains (17,18,23) were initially screened for
103
strain-specific alleles using the complete genome sequences of the St. Maries
104
(GenBank accession no. CP000030) and Florida strains (CP0011079).
105
were then screened against the whole-genome shotgun data contigs from the
106
Oklahoma (AFMX00000000.1), Washington-Okanogan (AFMW00000000.1) and South
107
Idaho strains (AFMY00000000.1) with the corresponding target loci using blastn, blastx
108
and tblastn analysis. Target sequences across the five strains were compiled using the
109
VECTOR NTI software suite (Invitrogen) and single nucleotide polymorphisms (SNPs)
110
and insertion/deletions (indels) identified. Forward and reverse primers were designed
111
for the identical conserved regions flanking the most informative SNPs and indels:
112
omp5, 5’-CTGGAAAGCTGCACAGGATG-3’ and 5’-CGACGCTTCCGCAAACATAC-3’;
113
omp9, 5’-GGCAATTCCAATCATGTGCG-3’ and 5’-CAAGCTGTGAAGTCACTACACG-
114
3’;
omp12,
5’-CTAGCGCTATGTTGCATGCATC-3’
5
Candidates
and
5’-
115
ACGCAAATTCAGATCACAGGG-3’; omp13, 5’-CAAGCAGATCCACAGCATCAATTC-3’
116
and
117
and 5’-CCACTTATTTCCACAATCTCCATGC-3’ (Supplemental Figure 1).
118
regions are all 18 months) pastured in high prevalence, tropical regions in the Mexican states
160
of Veracruz and Chiapas (26,27). Samples were collected in Medellin, Veracruz (19° 03'
Independent replicates of an uncloned strain (St.
7
161
N, 96° 09' W; annual precipitation of 1,418 mm), La Concordia, Chiapas (16º 07' N, 92º
162
41' W; annual precipitation of 1450 mm, and Tonala, Chiapas (16º 06' N, 93º 45' W;
163
annual precipitation of 1953 mm). Samples from temperate areas were obtained from
164
adult cattle (>18 months) pastured in temperate areas in the U.S.: Tonasket,
165
Washington (48° 42' N, 119° 26' W; annual precipitation of 383 mm) and Burns, Oregon
166
(43° 35' N, 119° 03' W; annual precipitation of 278 mm). Cattle had not been vaccinated
167
to prevent A. marginale infection nor treated to clear infection.
168
confirmed positive using either direct detection of A. marginale bacteremia or by
169
serological screening using msp5 C-ELISA (VMRD). DNA was extracted, processed,
170
and sequenced as described above.
171
Results
172
Identification of loci with strain-specific alleles:
173
genomic background of a strain independent of the variable msp2 allelic repertoire were
174
identified by screening the complete genome sequences of the St. Maries (GenBank
175
CP000030) and Florida strains (CP0011079).
176
encoded alleles specific to each strain (18,23). The single nucleotide substitutions
177
between the two strains ranged from 1-46 with nucleotide differences due to indels
178
ranged from 3-21 (Table 1). Each of these omps had previously been shown to be
179
invariant within a strain during long-term persistent infection and among different
180
animals infected with the same strain (18). The candidate loci were then compared
181
among three additional strains, South Idaho, Washington-Okanogan, and Oklahoma,
182
using sequences that had been generated by shotgun sequencing and previously
183
deposited in GenBank (AFMY00000000.1, AFMX00000000.1, AFMW00000000.1).
8
All samples were
Loci that would define the stable
Five omp loci were identified that
184
While allelic identity was observed among individual loci between specific strain pairs,
185
the inclusion of five target loci ensured that there were differences at multiple loci
186
between any given strain pair (Table 1). Importantly, three of these strains (St. Maries,
187
South Idaho, Washington-Okanogan) are from within the same region in the northwest
188
U.S. and are transmitted both naturally and experimentally by the same vector,
189
Dermacentor andersoni (25, 28-30), suggesting that the ability of the five targets to
190
distinguish among strains is conserved even among closely related strains. Primers
191
were designed in the conserved regions among these five strains to encompass the
192
largest number of strain-specific SNPs and indels (Table 2; Supplemental Figure 1).
193
Identification of single strain infections:
194
detected by high throughput sequencing of the target alleles with the predicted allele
195
represented by perfect matches. The paired read consensus sequences obtained for
196
each allele ranged from 2,229 to 8,080: omp5 (2,769 sequences in the Florida strain to
197
5,931 in the St. Maries strain); omp9 (2,229, Florida to 3,083, Washington-Okanogan);
198
omp12 (2,908, St. Maries to 4,101, Florida); omp13 (3,342, Florida to 4,072, St. Maries);
199
omp14 (4,371, Washington-Okanogan to 8,080, Florida).
200
without a single nucleotide mismatch) were >90% for all alleles in all strains with a
201
range from 90.4% for omp12 to 94.9% for omp9, both in the Washington-Okanogan
202
strain.
203
sequences generated: for the Florida strain there were 94.2% perfect match sequences
204
out of 2100 total for omp9 and 92.7% perfect match sequences out of 8080 total for
205
omp14.
206
reported for a comparison set of four microbial genomes using MiSeq, reflecting the
The identity of each strain was accurately
Perfect matches (those
The percentage of perfect matches was independent of the number of
This percentage of perfect match sequences was higher than previously
9
207
inclusion of only consensus sequences with no disagreement in overlap (31).
To
208
determine the “technical error level” associated with sequencing of the five loci, clones
209
of each target allele were obtained for the St. Maries strain and sequenced using the
210
same procedure in three biological replicates.
211
sequences for the cloned alleles were: omp5, 93.8±0.9; omp9, 93.9±1.3; omp12,
212
92.4±0.4; omp13, 93.8±0.5; and omp14, 93.8±1.1.
213
perfect match sequences for the five alleles as compared to previously published
214
genomes (31) is reflected in the higher percentage using cloned omps, consistent with
215
the designed sequence lengths being appropriate for strain differentiation using a multi-
216
locus approach. The percentage of perfect match sequences for each allele was not
217
significantly different between cloned omp and the same omp in the uncloned St. Maries
218
strain: omp5, p=0.2; omp9, p=0.2; omp12, p=0.4; omp13, p=0.1; omp14, p=0.2 (two
219
sample T-test assuming unequal variances). The criterion for identifying an allele was
220
set at a proportion of perfect match sequences that exceeded the technical error level
221
by a minimum of three standard deviations.
222
Identification of multiple strain infections:
223
would distinguish among strains, strain pairs were mixed to mimic strain superinfection
224
and then sequenced. In each of three strain pairs, St. Maries and Florida, St. Maries
225
and Washington-Okanogan, and Florida and Washington-Okanogan, each strain could
226
be definitively identified (Table 3). When the omp allele differed between the two paired
227
strains, the two alleles each were detected with a percentage of perfect matches that
228
exceeded the technical error level in all cases. When the omp allele was common
229
between the two paired strains (omp 5, St. Maries and Washington-Okanogan; omp 12
The percentage of perfect match
Thus the higher percentage of
To determine if the multi-locus approach
10
230
and omp13, Florida and Washington-Okanogan), the common allele was detected with
231
a percentage of perfect match sequences that exceeded 90% (Table 3).
232
Strain composition in naturally occurring superinfections: Infections from two tropical
233
regions in Mexico were examined, Veracruz and Chiapas. The infection prevalence
234
within these high humidity, high temperature regions has been reported as the highest
235
in Mexico, consistent with strong selective pressure for strain superinfection (26,27).
236
Multi-locus typing revealed multiple omp alleles in at least one locus for 21/23 tropical
237
region infections, 16/16 for all infections obtained in Veracruz and 5/7 from Chiapas
238
(Supplemental Table 1). The number of unique alleles per infection, both for each
239
individual locus and collectively across all loci, was significantly greater for tropical
240
region infections as compared to temperate region infections (Table 4). This reflected
241
both an increased percentage of loci with more than a single allele and an increase in
242
the number of alleles at each locus (Table 4). Correspondingly, the maximal mean
243
number of strains per animal in the tropical region was 3.3±1.2 with up to six strains in a
244
single infected animal. This was significantly higher (p=0.0004, Students’ unpaired t-
245
test) in the tropical regions as compared to infections in two temperate regions, with a
246
maximal mean of 1.3±0.5 with a maximum of two strains in an individual infected
247
animal.
248 249
Discussion
250
We accept that individual animals within endemic regions are infected with multiple
251
strains of different genetic backgrounds. The following four lines of evidence support
252
this conclusion. First, the multi-locus typing assay detected only one allele per target
11
253
omp locus when applied to experimental single strain infections. Second, the assay
254
detected the two predicted alleles in experimental paired strain infections and, when
255
there was a common allele between two strains, only the common allele was detected.
256
Unique alleles were detected with percentages of perfect sequence matches that
257
exceeded the mean plus 3 standard deviations of the technical error level as
258
determined for cloned omps and as represented by the omps in experimental single
259
strain infections. Third, individual infections from the high prevalence tropical regions
260
were identified that contained unique alleles for all five target omp loci. As each omp
261
locus in itself represents an independent detection event, the probability that the unique
262
alleles for all five target loci reflects sequencing error is extraordinarily low (p2 alleles at locus (maximum number) Tropical 2.4±1.0 (4) 19/23 10/23 omp 5a Temperate 1.3±0.5 (2) 3/8 0/8 Tropical 2.8±1.2 (5) 19/23 13/23 omp 9a Temperate 1.0±0.0 (1) 0/4 0/4 Tropical 2.7±1.1 (5) 18/23 13/23 omp 12a Temperate 1.1±0.4 (2) 1/8 0/8 Tropical 2.2±1.0 (4) 18/23 8/23 a omp 13 Temperate 1.1±0.4 (2) 1/8 0/8 Tropical 2.7±1.4 (6) 17/23 12/23 omp 14a Temperate 1.1±0.4 (2) 1/8 0/8 a The greater number of alleles per tropical region infection as compared to the temperate region infection was statistically significant for all loci collectively, p