Environmental Microbiology Reports
Accepted Article
1 2 3
Tle Distribution and Diversity in Metagenomic Datasets Reveals Niche Specialisation.1
4 5 6
Egan, F.1, Reen, F.J.1 and O’Gara, F.*1,2
7 8
1
9
Ireland.
10
2
BIOMERIT Research Centre, School of Microbiology, University College Cork, Cork,
Curtin University, School of Biomedical Sciences, Perth, WA, Australia.
11
12
Running Title: Tle niche specialisation.
13
14
15
16
17
18
This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1111/1758-2229.12222 This article is protected by copyright. All rights reserved.
* Correspondence: Prof. Fergal O’Gara, BIOMERIT Research Centre, Department of
20
Microbiology, University College Cork, Ireland. Tel : +353 (0)21 427 2097. Fax : +353 (0)21
21
427 5934. E-mail:
[email protected].
22
Summary
23
The existence of microbial communities and the complex interactions that govern their
24
dynamics have received considerable attention in recent years. Advances in genomic
25
sequencing technologies have greatly enhanced our understanding of ‘what is there’.
26
However, the question as to ‘what are they doing’ remains less well defined. The continual
27
development of the genomic and metagenomic sequence databases provides an exciting
28
opportunity to interrogate the distribution and prevalence of key microbial systems across a
29
diverse set of ecosystems. The widely distributed Type Six Secretion System (T6SS) has
30
been shown to play a significant role in bacterial-bacterial and bacterial-host interactions.
31
While several T6SS effectors have been shown to target the cell wall and membrane of
32
competing cells, little is known about the roles these proteins play in different ecosystems.
33
Therefore, the prevalence of a key T6SS effector superfamily known as type six lipase
34
effectors (Tle) was studied in over 2000 metagenomic datasets representing diverse
35
ecosystems and host niches. Increased Tle representation in environmental categories
36
strongly supports the hypothesis of niche specialisation and suggests that these effectors may
37
play important niche-specific roles.
Accepted Article
19
38 39 40 41
This article is protected by copyright. All rights reserved.
Accepted Article
42 43 44 45
Introduction
46
Bacterial interactions underpin community function, and are often dependent on secretion
47
systems. In Gram negative bacteria, six specialised secretion systems have been described –
48
Type I Secretion System-Type VI Secretion System (T1SS-T6SS). The most recently
49
uncovered system, T6SS, has emerged as a highly significant factor of the bacterial
50
interactome (Russell, Peterson, et al., 2014). Several reports suggest a role for T6SS in
51
attachment/biofilm formation, the general stress response, osmotolerance, and maintenance of
52
pH homeostatis by H+ ion secretion (Enos-Berlage et al., 2005; Aschtgen et al., 2008; Weber
53
et al., 2009; Gueguen et al., 2013; Zhang et al., 2013). However, it is most widely considered
54
as a weapon of interbacterial warfare, being involved in virulence towards both prokaryotic
55
and eukaryotic organisms (Filloux, 2013). Of particular interest is the role of T6SS in
56
moderating population dynamics within an ecosystem. The growing awareness of the
57
polymicrobial communities that exist and thrive in most ecological and clinical niches has
58
heightened the interest in microbial factors that can control population dynamics. While
59
experimental models that facilitate the investigation and dissection of polymicrobial
60
interactomes continue to emerge, the explosion in genomic and metagenomic datasets
61
provides an ideal opportunity to gain significant insights into the physiological role of these
62
secretion systems in that context i.e. within communities.
63
Secretion system abundance (including T6SS abundance) in metagenomes has been studied
64
previously (Persson et al., 2009; Barret et al., 2013), and it appears that T6SS is over-
65
represented in niche-specific environments. Importantly however, given the diverse This article is protected by copyright. All rights reserved.
functionality of T6SS, it is not clear to what extent this abundance reflects the importance of
67
T6SS-mediated killing in these environments, a process which is carried out by the action of
68
effector molecules. Several T6SS effector superfamilies have been identified to date: T6S
69
amidase effectors (Tae), T6S glycosidase hydrolase effectors (Tge), T6S lipase effectors
70
(Tle) and T6S DNase effectors (Tde) (Ma et al., 2014; Russell, Peterson, et al., 2014). While
71
Tae and Tge target the bacterial cell wall, Tle are phospholipases which target the cell
72
membrane while Tde targets DNA (Russell et al., 2013). Furthermore, individual families
73
within these superfamilies have specific sites of action due to their own characteristic
74
enzymatic activities. For example, Tle1, Tle2 and Tle5 have PLA2, PLA1 and PLD activity,
75
respectively (Russell et al., 2013).
76
Of these effector superfamilies, Tle appears to be most widespread in genomic sequences,
77
being encoded in a broad spectrum of bacterial species, including a range of emerging and
78
established pathogens (Russell et al., 2013). Moreover, while all T6SS effector superfamilies
79
have been shown to function in interbacterial competition, Tle has also been shown to be
80
involved in virulence towards eukaryotes. Indeed, the first example of a Tle effector secreted
81
in a T6SS-dependent manner was Tle2VC/TseL from Vibrio cholerae, which was found to
82
bind to VgrG and was necessary for efficient killing of amoeba (Dong et al., 2013). Another
83
study identified Tle genes within genomic regions whose disruption resulted in loss of
84
virulence to diverse eukaryotic hosts including macrophage, mice, amoeba, and insects
85
(Waterfield et al., 2008). Two recent publications also demonstrate that Tle sequences can
86
mediate bacterial-eukaryote interactions (Jiang et al., 2014; Lery et al., 2014). Similar to the
87
T6SS itself, multiple tle genes can be encoded in the same genome, and these are usually of
88
distinct phylogenetic origins. For example, five Tle genes are present in Pseudomonas
89
aeruginosa PA14 (Tle1, Tle3, Tle4 and two copies of Tle5). The existence of multiple Tle
90
families and the fact that several can be found within the same genome suggests that each
Accepted Article
66
This article is protected by copyright. All rights reserved.
family may have some degree of specialisation. While the membrane functionality of these
92
effectors within microbial genomes appears to have arisen through convergence, a majority
93
of Tle are encoded on horizontally acquired island regions, linked to vgrG loci (Barret et al.,
94
2011). Though sequence conservation may be limited between families, niche specialisation
95
might provide some insight into the role of these proteins in shaping microbial populations in
96
diverse environments.
97
Therefore, in this study we investigated the distribution and abundance of tle genes in
98
genomic and metagenomic databases. The relatively widespread nature of the T6SS and Tle,
99
makes the Tle a good candidate for a highly focused study on bacterial-mediated killing in
100
different ecosystems, and potentially a good proxy for the level of competition within an
101
environment. Tle distribution was distinct from T6SS distribution and each family of Tle
102
exhibited a distinct spectrum of distribution across the individual ecosystems. Several niche-
103
specific signatures were identified, with a particular emphasis on Tle5 abundance in human
104
samples. Furthermore, changes in conserved Tle residue motifs between niches provide
105
additional evidence of selection, although biochemical and genetic analyses will be required
106
to evaluate their importance. Taken together, these data strongly support a role for Tle
107
effector proteins in moderating microbial community structure in a broad spectrum of
108
ecosystems, marking them as key targets for therapeutic development.
Accepted Article
91
109 110 111 112 113
This article is protected by copyright. All rights reserved.
Accepted Article
114 115 116 117
Results and discussion
118
Uncovering Tle diversity in genomic and metagenomic datasets
119
Tle proteins are emerging as key factors in mediating microbial-microbial and microbial-host
120
interactions, suggesting a role in shaping microbiome structures across a wide spectrum of
121
ecological niches. Analysis of the distribution and abundance of Tle genes in a range of
122
diverse ecosystems would provide important insights into the functionality of this major class
123
of secretion effector and uncover the potential for niche-specialisation. The recent dramatic
124
increase in genome and metagenome sequences being made publically available provided an
125
ideal opportunity to pursue this goal. The IMG database now contains over 2000
126
metagenomes, many of which can be assigned to the broader environmental categories:
127
marine, fresh, human, arthropod, rhizosphere/soil, and engineered. Tle occurrence was
128
determined for each environment. No Pfam or COG domains were both specific and
129
exclusive to the various Tle families. Therefore, to avoid false positive and false negatives,
130
databases were searched for Tle sequences using the program BLASTP on the IMG database.
131
To establish the veracity of this method and generate bait sequences, multiple rounds of
132
BLASTP analysis were performed using genomic datasets on IMG. Results were limited to
133
hits encoded adjacent to a vgrG gene using the gene neighbourhood tool in IMG. After
134
several iterations a list of baits representing the fewest amount of diverse Tle needed to
135
obtain all Tle hits from genomic data was generated (File S1). As this approach is based on
136
already described Tle, any highly divergent members of known Tle, or members of currently
137
unidentified Tle families, would not be detected in this analysis. This article is protected by copyright. All rights reserved.
Previous analysis of secretion system abundance in metagenomes normalised incidence by
139
the number of predicted Proteobacterial genes in the dataset, as the most studied secretion
140
systems were overwhelmingly present in this phylum (Barret et al., 2013). This ensured that
141
secretion system frequency was not simply an artefact of Proteobacterial frequency or the
142
different amounts of sequence data in each metagenome. As genomic analysis of Tle
143
occurrence revealed that Tle are also predominantly encoded in Proteobacterial genomes, this
144
method of normalisation was also employed in this study. As expected, the overall genomic
145
distribution of Tle was similar to the previously reported genomic distribution of T6SS, being
146
predominantly present in Proteobacteria, and also occasionally found in Acidobacteria and
147
Planctomycetes. In absolute terms, 2078 Tle were identified in metagenomes and the
148
numbers of Tle in an environmental category ranged from 31 in soil to 1059 in human. In
149
relative terms the numbers of Tle ranged from approximately 1 per 4000 prokaryote genes in
150
arthropod metagenomes to 1 per 700,000 prokaryote genes in soil metagenomes.
Accepted Article
138
151 152
Environment type determines Tle numbers
153
Perhaps unsurprisingly given the diversity of the environmental categories studied,
154
considerable differences in total Tle frequency were evident between these metagenome sets
155
(Fig. 1).
156
It is particularly notable that the arthropod, rhizosphere and human metagenomes, all host
157
associated niches, contain a large abundance of tle genes, while Tle is relatively infrequent in
158
aquatic environments and in bulk soil. This may simply reflect the fact that Tle are involved
159
in interactions with eukaryotes, or that Tle abundance may reflect the level of
160
activity/competition within these niches. Indeed, previous analysis of aquatic metagenomes
161
found that secretion system genes were much more prevalent in productive waters (Persson et
This article is protected by copyright. All rights reserved.
al., 2009). However, a role for Tle proteins in the marine and soil ecosystems cannot be ruled
163
out. Other factors may also contribute to a lower Tle abundance. The lack of Tle in aquatic
164
environments might be due to the lack of activity within this niche or could possibly reflect
165
that open water niches in aquatic environments are not conducive to a contact-dependent
166
method of bacterial killing. Greater sampling of sediments or sites of high bacterial
167
concentrations from these environments may result in greater Tle representation in aqueous
168
environments, though in the limited amount of sediment-based aquatic metagenomes Tle
169
genes are not more abundant. One caveat to these results is that certain sites can dominate an
170
environmental category. For example, of the freshwater genomes, roughly 45% of the
171
prokaryote genes and 90% of T6SS come from a single site; Wetland microbial communities
172
from Twitchell Island in the Sacramento Delta. Despite this, only 12% of freshwater Tle
173
comes from this source. A closer look at the contribution of the various metagenomes to the
174
Tle frequency in their environmental category reveals interesting observations. Fungal-
175
associated arthropod genomes make up about 5.6% of total arthropod metagenome DNA, but
176
can account for at least 25% of all Tle, suggesting that they may be involved in bacterial-
177
fungal interactions.
178
Within human metagenomes, Proteobacteria are much more common in the oral niche than in
179
the stool (Fig. S1), with Tle frequency being significantly higher in the mouth than in the gut,
180
even allowing for the differential Proteobacterial representation. Indeed, the latter niche
181
contributes less than 20% of the amount of Tle that would be expected based on its
182
Proteobacterial gene count. Whether Tle should be considered overrepresented in the mouth
183
or underrepresented in the gut is a matter of perspective, but these data certainly support a
184
role for Tle proteins in contributing to the bactericidal activity in the oral niche.
Accepted Article
162
185
This article is protected by copyright. All rights reserved.
Tle abundance differs from T6SS abundance
187
The niche-differential abundance of Tle proteins suggests an important and specific role for
188
these effectors in polymicrobial environments. While Tle function would appear to be
189
specific, the role of the T6SS itself is more functionally diverse. Therefore, we considered the
190
possibility that Tle abundance would not necessarily reflect T6SS abundance. To test this
191
hypothesis the correlation between T6SS frequency and Tle frequency in the datasets was
192
examined. T6SS frequency in the various environments was assessed using a previously
193
published method (Barret et al., 2013) which was applied to new metagenomic datasets.
194
As shown in Fig. 2, the abundance of T6SS does not reflect the abundance of Tle, as T6SS
195
numbers are relatively similar across the metagenomes datasets. The human metagenomes
196
have highest amount of T6SS, but T6SS is only 2.4 times more abundant in this environment
197
than in the marine metagenomes where it is least frequent. This is consistent with previous
198
results which show that individual niches could have larger variation in T6SS frequency
199
(Barret et al., 2013), as extremes will have less of an impact when aggregated in an
200
environmental category. A corollary to the disparity between T6SS and Tle abundance is that
201
T6SS does not seem to be enriched by the environments which might be expected to have
202
greater available energy or the presence of a eukaryotic host. For example, the rhizosphere
203
and host-associated arthropod metagenomes have proportionally fewer T6SS per prokaryotic
204
genes than bulk soil and engineered metagenomes respectively, in spite of the fact that the
205
latter two environments would not be expected to contain a higher proportion of eukaryotes.
206
The discrepancy between T6SS and Tle abundance may suggest that T6SS may be playing a
207
role not related to killing in the environments where Tle is underrepresented. Alternatively,
208
killing could be achieved using a different set of effector molecules.
Accepted Article
186
209
This article is protected by copyright. All rights reserved.
Divergent Tle family representation suggests niche-specific specialisation of Tle families
211
As Tle genes are most often present on horizontally transferred vgrG islands they are not
212
necessarily constrained by the general evolution of the bacteria they reside in (Barret et al.,
213
2011). For example, P. aeruginosa PA7 has Tle2 but other P. aeruginosa strains such as
214
PAO1 do not. Does the evolution of five different families of Tle represent some degree of
215
specialisation? As mentioned, Tle families do have unique sites of activity for phospholipid
216
cleavage (Russell et al., 2013). If the evolution of five separate families occurred because
217
each family was in some way specialised, and therefore more useful in certain niches, it
218
follows that various environments would show variation in the frequency of Tle families. If
219
the contrary is true, and no niche is exerting selective pressure to favour particular Tle
220
families, all families should be expected to occur with similar frequencies.
221
While Tle occur with similar frequencies in some niches, other niches show large variations
222
(Table S1 and Fig. 3). Distribution of the various Tle families in arthropod metagenomes is
223
consistent, with only a 3 fold difference between the number of the least common and most
224
common Tle families. However, some of the Tle2 hits from arthropod metagenomes may
225
represent Tle2 effectors which have been co-opted into the insecticidal “Toxin Complex”
226
(Yang et al., 2012). The rhizosphere has a more disparate Tle distribution, with the most
227
abundant family (Tle2) being 10 times more abundant than the least abundant family (Tle4).
228
Though Tle are generally uncommon in aquatic environments, Tle1 and Tle2 are relatively
229
enriched in freshwater and marine datasets, respectively.
230
The most striking variation in Tle family abundance is found within the human metagenomes.
231
Tle2 and Tle3 are completely absent but Tle1 and especially Tle5 are highly abundant.
232
Initially, the lack of Tle2 and Tle3 families from human metagenomes was surprising because
233
they occur in genomes of bacteria which have been isolated from, or have some association
Accepted Article
210
This article is protected by copyright. All rights reserved.
with, humans. Tle2 and Tle3 can be found in known human pathogens, but these may not be
235
common in metagonomes of heathy humans. Another reason for the discrepancy between
236
metagenome data and genome data may be explained by the specific sites which have been
237
sequenced for metagenome analysis. Some human-associated bacteria possessing these Tle
238
families, such as Masilla timonae (Lindquist et al., 2003), were isolated from areas of the
239
body which are not represented in the available metagenome datasets.
240
This raises the possibility that not only are general environments selecting for Tle, but there
241
are also specific selections depending on various niches within these environments. Few of
242
the Tle from human metagenomes are present in the gut, which is largely explained by the
243
paucity of the Tle5 family. This niche contributes less than 8% of the Tle5 numbers it would
244
be expected to based on the levels of Proteobacterial present. In contrast, Tle5 is highly
245
represented in the mouth. As this is a highly specific niche we assessed the divergence of Tle
246
sequences found in this site. It is clear from BLAST analyses that most of the Tle5 in this
247
niche are homologous to Tle from the Haemophilus and Aggregatibacter genera.
248
Experimental evidence will be required to determine whether the frequency of Tle5
249
contributes to the prevalence of these bacteria within the mouth, or whether any other Tle
250
family would serve equally as well.
251
As Tle is likely to be horizontally transferred, the differential abundance of Tle families could
252
also be due to a founder effect. To test this hypothesis, the niche location of Tle within
253
phylogenetic trees of Tle sequences was noted (Fig. S2-S6). Tle phylogeny was not well
254
correlated with niche, except in cases of Tle from highly related strains in the same niche.
255
This is incongruent with a potential founder effect.
256
Furthermore, instances where members of the same species independently acquired members
257
of the same Tle family, suggests a degree of selective pressure to obtain these genes. P.
Accepted Article
234
This article is protected by copyright. All rights reserved.
aeruginosa strains have several Tle genes, but closely related species such as P. resinovorans
259
or P. thermotolerans have few or no Tle, which may be due the fact that P. aeruginosa live in
260
several different niches. Other Tle profiles from genomes are in agreement with the
261
metagenomic data. Tle2 is very frequent in both rhizosphere metagenomes and in soil-
262
dwelling P. fluorescens species, while being much less common in other Pseudomonas
263
species. Tle1 and Tle2 are enriched in aquatic metagenomes and also in Vibrio species.
264
An interesting feature of the analysis was the finding that some metagenomic Tle1 sequences
265
were homologous to lipases found in the phylum Bacteroidetes. In fact, these lipases appear
266
to be genuine effectors of another phage-derived secretion system, similar to the
267
Photorhabdus virulence cassette/anti-feeding island, which was hypothesised to be a
268
divergent T6SS (Yang et al., 2006; Penz et al., 2010; Zhang et al., 2012). This hypothesis
269
was recently confirmed (Russell, Wexler, et al., 2014). As this system is less common than
270
the T6SS in genomic sequences (for example see Persson et al., 2009) and Bacteroidetes are
271
less common than Proteobacteria in metagenomic datasets (data not shown), most Tle1 hits
272
obtained in metagenomes are expected to be T6SS-associated. However, BlastP searches
273
suggest that approximately half of the human-associated Tle1 sequences (roughly 10% of
274
overall Tle sequences from human metagenomes) are from Bacteroidetes, possibly a
275
reflection of the prevalence of Bacteroidetes in the mouth. Conversely, BlastP searches
276
suggest very few of the Tle1 sequences from other metagenomes are from Bacteroidetes.
Accepted Article
258
277 278
Divergence within Tle families
279
During the course of the BLAST analysis of Tle5 it became clear that there were two clades
280
emerging where several sequences were quite divergent from the majority. A maximum
281
likelihood phylogenetic tree revealed a clear divergence between two branches of Tle5
This article is protected by copyright. All rights reserved.
proteins, which are referred to hereinafter as Tle5a (which includes PldA) and Tle5b (which
283
includes PldB from P. aeruginosa and PLD1 from Klebsiella pneumonia) (Fig. 4) (Jiang et
284
al., 2014; Lery et al., 2014). This split between Tle5 sequences is congruent with the original
285
phylogenetic analysis of Tle5 published by Russell and colleagues and also a recent analysis
286
published by Jiang and colleagues (Russell et al., 2013; Jiang et al., 2014). The lack of
287
similarity between these branches is such that members of the Tle5a sub-cluster, which
288
includes the previously characterised PldA/Tle5 from PAO1, are more homologous to
289
phospholipases from eukaryotes than they are to their Tle5b counterparts. Tle5a are much
290
less common both in genome and metagenome datasets, being infrequent in most niches.
291
An examination of the genomic context of Tle5 from genome sequences shows that this
292
divergence is also evident in the adjacently encoded immunity genes. Tle5b genes are
293
encoded next to putative immunity genes with Sel1 domains, and these are relatively
294
conserved, while the less common Tle5a genes are encoded next to more divergent immunity
295
proteins which have little homology to the Sel1-containing proteins. Indeed, Tle5b from K.
296
pneumonia was reported to lack a cognate immunity gene, due to the lack of similarity
297
between the putative immunity gene Tli5b and the previously described Tli5a immunity
298
protein (Lery et al., 2014). Protein structure can be similar in the absence of sequence
299
homology, but Tli5a and Tli5b were predicted to be different structures by modelling tool
300
Phyre2 (Table S2) (Kelley and Sternberg, 2009). In addition, it was recently shown that Tli5a
301
and Tli5b immunity proteins from P. aeruginosa were only activate against Tle5a and Tle5b,
302
respectively, Therefore, it might be useful to consider Tle5 as being composed of the two
303
sub-groups Tle5a and Tle5b (Fig. 4).
Accepted Article
282
304
This article is protected by copyright. All rights reserved.
Sequence analysis reveals conservation of motifs with residue divergence in all families
306
of Tle in diverse ecological niches.
307
In the event that Tle niche specialisation were to occur, selective pressure might be expected
308
to manifest itself in the amino acid sequences of Tles from different environments. Therefore,
309
a comparison of Tle amino acid sequences from available metagenome datasets was
310
undertaken, focusing on the niche-specific conservation of residues. The five Tle lipase
311
families contain either the GxSxG motif or dual HxKxxxD motifs. Several additional
312
conserved motifs exist in each specific family and, in line with our expectations, the areas of
313
greatest conservation were observed in domains predicted to have lipase activity (Fig. 5)
314
(File S2-S10). However, inter-familial alignments of Tle proteins did not identify motifs that
315
were common to all families, apart from the conserved GxSxG motif shared by Tle1-4.
316
The lack of any conserved motifs between Tle families supports the hypothesis that these
317
families have unique features. These unique features may be amenable to selection by
318
specific niches, resulting in the differential Tle family abundance reported above. To
319
investigate this, sequences of the same Tle family from different niches were compared, and
320
analysed for the occurrence of changes in highly conserved amino acid residues. Several
321
niche-specific conservations were identified, including residue 2596 in the Tle5 alignment,
322
where the polar uncharged amino acid Asparagine (N) dominates in sequences from human
323
metagenomes but the positively charged Histidine dominates in sequences from arthropod
324
metagenomes (Fig. 6). More frequent were residues which were quite conserved in sequences
325
from one environment, but not in sequences from another environment (Fig. 6). Due to the
326
partial nature of many metagenome sequences, in many positions there is limited coverage,
327
which may mask other niche-specific residue changes.
Accepted Article
305
This article is protected by copyright. All rights reserved.
Different environments select for certain species, and whether the variation in residues of Tle
329
from different niches was due to genetic drift in these species or some selective pressure
330
driving adaption will remain unclear until further functional and bioinformatic studies are
331
completed. Although our data do not support a role for a founder effect, demonstrating
332
convergence of each polymorphism is currently restricted by limitations in the available
333
metagenomic datasets, where individual species or genera dominate the available genetic
334
information from each niche. While convergent evolution would be the logical outcome of
335
selective pressure at certain residue sites, more sequencing data, and possibly experimental
336
data, are needed before this can be demonstrated. However, these data are congruent with the
337
hypothesis that selective pressure within different environments may manifest in divergent
338
Tle sequences.
Accepted Article
328
339 340
Conclusion
341
Analysis of the distribution of a widespread effector superfamily is a novel and potentially
342
more accurate way of examining levels of T6SS-mediated competition and killing within
343
environments. The differing distribution patterns of the Tle superfamily in various niches is
344
an interesting observation which offers insights into bacterial activity within that niche.
345
Moreover, the differential abundance of Tle families within certain niches as well as the
346
presence of different Tle families within the same genomes suggests there is specialisation of
347
the various Tle families. Indeed, a recent report identified niche specialisation of the twin-
348
arginine translocation system, with different levels of selective pressure on the TatC protein,
349
which were focused on certain positions in the protein, evident in different strains (Simone et
350
al., 2013). This suggests that these phenomena may be more widespread generally among
351
bacterial secretion systems. While the wealth of metagenomic data now available presents an
This article is protected by copyright. All rights reserved.
opportunity to find ways to study ecosystems directly, environmental factors can be diverse
353
even within niches and as sampling levels increase, it will further inform the analysis and
354
conclusions drawn by this study.
Accepted Article
352
355 356 357 358
Acknowledgments
359
The authors would like to thank Dr. Marlies Mooij for valuable discussions.
360
This research was supported in part by grants awarded by the European Commission (FP7-
361
PEOPLE-2013-ITN, 607786;
362
311975; OCEAN 2011-2, 287589; Marie Curie 256596), Science Foundation Ireland (SSPC-
363
2,
364
12/TIDA/B2405; 09/RFP/BMT2350),
365
(FIRM/RSF/CoFoRD; FIRM 08/RDC/629), the Irish Research Council for Science,
366
Engineering and Technology (PD/2011/2414; RS/2010/2413), the Health Research Board
367
(HRA/2009/146), the Environmental Protection Agency (EPA2008-PhD-S-2), the Marine
368
Institute (Beaufort award C2CRA 2007/082) Teagasc (Walsh Fellowship 2013) and the
369
Higher Education Authority of Ireland (PRTLI4).
12/RC/2275;
FP7-KBBE-2012-6, CP-TP-312184; FP7-KBBE-2012-6,
07/IN.1/B948; the
13/TIDA/B2625; Department
370 371 372 373
This article is protected by copyright. All rights reserved.
of
12/TIDA/B2411;
Agriculture
and
Food
Accepted Article
374 375
References
376 377 378 379
Aschtgen, M.-S., Bernard, C.S., Bentzmann, S.D., Lloubès, R., and Cascales, E. (2008) SciN Is an Outer Membrane Lipoprotein Required for Type VI Secretion in Enteroaggregative Escherichia coli. J. Bacteriol. 190: 7523–7531.
380 381 382 383
Barret, M., Egan, F., Fargier, E., Morrissey, J.P., and O’Gara, F. (2011) Genomic analysis of the type VI secretion systems in Pseudomonas spp.: novel clusters and putative effectors uncovered. Microbiology 157: 1726–1739.
384 385 386 387 388
Barret, M., Egan, F., and O’Gara, F. (2013) Distribution and diversity of bacterial secretion systems across metagenomic datasets. Environ. Microbiol. Rep. 5: 117–126. Crooks, G.E., Hon, G., Chandonia, J.-M., and Brenner, S.E. (2004) WebLogo: A Sequence Logo Generator. Genome Res. 14: 1188–1190.
389 390 391 392
Dong, T.G., Ho, B.T., Yoder-Himes, D.R., and Mekalanos, J.J. (2013) Identification of T6SS-dependent effector and immunity proteins by Tn-seq in Vibrio cholerae. Proc. Natl. Acad. Sci. 110: 2623–2628.
393 394 395 396
Enos-Berlage, J.L., Guvener, Z.T., Keenan, C.E., and McCarter, L.L. (2005) Genetic determinants of biofilm development of opaque and translucent Vibrio parahaemolyticus. Mol. Microbiol. 55: 1160–1182.
397 398
Filloux, A. (2013) The rise of the Type VI secretion system. F1000Prime Rep. 5:
399 400 401 402 403
Gueguen, E., Durand, E., Zhang, X.Y., d’ Amalric, Q., Journet, L., and Cascales, E. (2013) Expression of a Yersinia pseudotuberculosis Type VI Secretion System Is Responsive to Envelope Stresses through the OmpR Transcriptional Activator. PLoS ONE 8: e66615.
404 405 406
Huang, Y., Niu, B., Gao, Y., Fu, L., and Li, W. (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26: 680–682.
407 408 409 410
Jiang, F., Waterfield, N.R., Yang, J., Yang, G., and Jin, Q. (2014) A Pseudomonas aeruginosa Type VI Secretion Phospholipase D Effector Targets Both Prokaryotic and Eukaryotic Cells. Cell Host Microbe 15: 600–610.
411 412 413
Kelley, L.A. and Sternberg, M.J.E. (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nat. Protoc. 4: 363–371.
This article is protected by copyright. All rights reserved.
Lery, L.M., Frangeul, L., Tomas, A., Passet, V., Almeida, A.S., Bialek-Davenet, S., et al. (2014) Comparative analysis of Klebsiella pneumoniae genomes identifies a phospholipase D family protein as a novel virulence factor. BMC Biol. 12: 41. Lindquist, D., Murrill, D., Burran, W.P., Winans, G., Janda, J.M., and Probert, W. (2003) Characteristics of Massilia timonae and Massilia timonae-Like Isolates from Human Patients, with an Emended Description of the Species. J. Clin. Microbiol. 41: 192– 196.
422 423 424 425
Ma, L.-S., Hachani, A., Lin, J.-S., Filloux, A., and Lai, E.-M. (2014) Agrobacterium tumefaciens Deploys a Superfamily of Type VI Secretion DNase Effectors as Weapons for Interbacterial Competition In Planta. Cell Host Microbe 16: 94–104.
426 427 428 429
Penz, T., Horn, M., and Schmitz-Esser, S. (2010) The genome of the amoeba symbiont “Candidatus Amoebophilus asiaticus” encodes an afp-like prophage possibly used for protein secretion. Virulence 1: 541–545.
430 431 432 433
Persson, O.P., Pinhassi, J., Riemann, L., Marklund, B.-I., Rhen, M., Normark, S., et al. (2009) High abundance of virulence gene homologues in marine bacteria. Environ. Microbiol. 11: 1348–1357.
434 435 436 437
Russell, A.B., LeRoux, M., Hathazi, K., Agnello, D.M., Ishikawa, T., Wiggins, P.A., et al. (2013) Diverse type VI secretion phospholipases are functionally plastic antibacterial effectors. Nature 496: 508–512.
438 439 440
Russell, A.B., Peterson, S.B., and Mougous, J.D. (2014) Type VI secretion system effectors: poisons with a purpose. Nat. Rev. Microbiol. 12: 137–148.
441 442 443 444
Russell, A.B., Wexler, A.G., Harding, B.N., Whitney, J.C., Bohn, A.J., Goo, Y.A., et al. (2014) A Type VI Secretion-Related Pathway in Bacteroidetes Mediates Interbacterial Antagonism. Cell Host Microbe 16: 227–236.
445 446 447 448
Simone, D., Bay, D.C., Leach, T., and Turner, R.J. (2013) Diversity and Evolution of Bacterial Twin Arginine Translocase Protein, TatC, Reveals a Protein Secretion System That Is Evolving to Fit Its Environmental Niche. PLoS ONE 8: e78742.
449 450 451 452
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., and Kumar, S. (2011) MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol. Biol. Evol. 28: 2731–2739.
453 454 455 456 457
Waterfield, N.R., Sanchez-Contreras, M., Eleftherianos, I., Dowling, A., Yang, G., Wilkinson, P., et al. (2008) Rapid Virulence Annotation (RVA): Identification of virulence factors using a bacterial genome library and multiple invertebrate hosts. Proc. Natl. Acad. Sci. 105: 15967–15972.
Accepted Article
414 415 416 417 418 419 420 421
This article is protected by copyright. All rights reserved.
Weber, B., Hasic, M., Chen, C., Wai, S.N., and Milton, D.L. (2009) Type VI secretion modulates quorum sensing and stress response in Vibrio anguillarum. Environ. Microbiol. 11: 3018–3028. Yang, G., Dowling, A.J., Gerike, U., ffrench-Constant, R.H., and Waterfield, N.R. (2006) Photorhabdus Virulence Cassettes Confer Injectable Insecticidal Activity against the Wax Moth. J. Bacteriol. 188: 2254–2261.
465 466 467 468
Yang, G., Hernández-Rodríguez, C.S., Beeton, M.L., Wilkinson, P., ffrench-Constant, R.H., and Waterfield, N.R. (2012) Pdl1 Is a Putative Lipase that Enhances Photorhabdus Toxin Complex Secretion. PLoS Pathog 8: e1002692.
469 470 471 472 473
Zhang, D., Souza, R.F. de, Anantharaman, V., Iyer, L.M., and Aravind, L. (2012) Polymorphic toxin systems: Comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomics. Biol. Direct 7: 1–76.
474 475 476 477
Zhang, W., Wang, Y., Song, Y., Wang, T., Xu, S., Peng, Z., et al. (2013) A type VI secretion system regulated by OmpR in Yersinia pseudotuberculosis functions to maintain intracellular pH homeostasis. Environ. Microbiol. 15: 557–569.
Accepted Article
458 459 460 461 462 463 464
478 479 480 481 482 483 484 485 486 487 488
This article is protected by copyright. All rights reserved.
Accepted Article
489 490
Table and figure legends
491
Figure 1. Total Tle abundance (Tle/proteobacterial gene count) in environmental datasets
492
shows variation, with high frequency of Tle correlated with host-associated metagenomes.
493
Figure 2. (A) T6SS abundance in environmental datasets was calculated similarly to Tle
494
abundance (T6SS gene COG3516/proteobacterial gene count). (B) Comparison of T6SS and
495
Tle frequency shows that T6SS and Tle abundance are dissimilar. Due to the limited variation
496
in T6SS numbers, Tle genes per T6SS are highest in host-associated metagenomes, mirroring
497
the overall Tle distribution.
498
Figure 3. The frequency of Tle families in metagenomic data from various environmental
499
categories shows marked variation in Tle family distribution in several areas. The highest
500
numbers of three families occur in arthropods, while Tle2 and Tle5 are most common in the
501
rhizosphere and in the human microbiome. Data is again displayed as Tle hits per
502
proteobacterial genes; Tle family abundance across all metagenomes, weighted by
503
proteobacterial abundance per contributing environment, reflects aggregate length of bars per
504
each Tle family.
505
Figure 4. Maximum likelihood trees of Tle5 (A) and Tli5 (B) proteins from genome
506
sequences were generated with MEGA5 using the WAG amino acid substitution model with
507
1000 bootstrap replicates (displayed as percentages) (Tamura et al., 2011). The Tle5a
508
proteins and the corresponding immunity genes are coloured in red, and group separately to
509
the Tle5b and Tli5b proteins. As Tle5b proteins are much more common than Tle5a proteins,
510
only representatives of Tle5b were used in the tree.
511
Figure 5. Weblogo images showing consensus motifs of Tle5 protein sequences. Upper logos
512
represent sequences from arthropod metagenomes, while the lower logos represent sequences This article is protected by copyright. All rights reserved.
from human metagenomes. Genomic sequences for each Tle were aligned using MAFFT E-
514
INS-I using the BLOSSUM62 substitution matrix (Kato et al., 2013) and metagenomic
515
sequences were added to the genomic alignment using the add fragments option on the
516
MAFFT server. Consensus sequences were derived using WebLogo 3 (Crooks et al., 2004),
517
after generation of a non-redundant set of sequences using CD-HIT to eliminate sequences
518
showing 100% identity (Huang et al., 2010).
519
Figure 6. Consensus sequences for 2 regions in (A) Tle2 and (B) Tle5 show differences in
520
conserved residues from sequences found in different environments. For Tle2 sequences from
521
arthropod metagenomes (top) are compared with sequences from rhizosphere metagenomes
522
(bottom). For Tle 5 sequences from arthropod metagenomes (top) are compared with
523
sequences from human metagenomes
524
Table S1. Tle and Proteobacterial gene count in assembled metagenomes from different
525
environments. Predicted Proteobacterial genes in metagenomes were calculated using the
526
metagenomes vs genomes tool on IMG with 60% identity used as a cut-off.
527
Table S2. Tle and Tli amino acid sequences from P. aeruginosa were analysed using Phyre2
528
(Kelley and Sternberg, 2009). The best Phyre2 hit is shown for each sequence except for
529
Tli5b, where the next best hit is also shown as it is another high confidence hit and it is a
530
T6SS-associated result. Interestingly, both Tli4 and Tli5a are modelled with high confidence
531
to the Mog1/PspB structure which is strongly suggested to be associated with T6SS by gene
532
neighbourhood linking. Currently these proteins have no known role in T6SS, but are
533
hypothesised to be adapters between the T6SS structure and effectors (Zhang et al., 2012).
534
The confidence in the model, the percentage of identities shared between query sequence and
535
template, and the percentage length of the query sequence which is modelled are displayed.
536
As expected, all Tle sequences were best matched to other lipases.
Accepted Article
513
This article is protected by copyright. All rights reserved.
Fig. S1. Proteobacterial composition of human-associated metagenomes. Predicted
538
Proteobacterial genes in metagenomes were calculated using the metagenomes vs genomes
539
tool on IMG with 60% identity used as a cut-off.
540
Fig. S2-S6. Bootstrap consensus maximum likelihood trees of Tle families, noting the habitat
541
from which the bacterium was isolated: human (H), fresh water (F), marine (M), plant (P) and
542
arthropod (A). The similarity of Tle sequences from different environments argues against
543
Tle family distribution in metagenomes being due to a founder effect. Proteins from genome
544
sequences were generated with MEGA5 using the WAG amino acid substitution model with
545
500 bootstrap replicates (Tamura et al., 2011).
546
File S1. Bait sequences for BLAST analyses of metagenomes.
547
File. S2 Consensus sequences of Tle1 from arthropod metagenomes
548
File. S3 Consensus sequences of Tle2 from arthropod metagenomes
549
File. S4 Consensus sequences of Tle3 from arthropod metagenomes
550
File. S5 Consensus sequences of Tle4 from arthropod metagenomes
551
File. S6 Consensus sequences of Tle5 from arthropod metagenomes
552
File. S7 Consensus sequences of Tle2 from human metagenomes
553
File. S8 Consensus sequences of Tle4 from human metagenomes
554
File. S9 Consensus sequences of Tle5 from human metagenomes
555
File. S10. Consensus sequences of Tle2 from rhizosphere metagenomes
Accepted Article
537
This article is protected by copyright. All rights reserved.
Accepted Article
EMI4_12222_F1.tiff
Accepted Article
EMI4_12222_F2.tiff
Accepted Article
EMI4_12222_F3.tiff
Accepted Article EMI4_12222_F4.tiff
Accepted Article
EMI4_12222_F5.tiff
Accepted Article
EMI4_12222_F6.tiff