Environmental Microbiology Reports

Accepted Article

1 2 3

Tle Distribution and Diversity in Metagenomic Datasets Reveals Niche Specialisation.1

4 5 6

Egan, F.1, Reen, F.J.1 and O’Gara, F.*1,2

7 8

1

9

Ireland.

10

2

BIOMERIT Research Centre, School of Microbiology, University College Cork, Cork,

Curtin University, School of Biomedical Sciences, Perth, WA, Australia.

11

12

Running Title: Tle niche specialisation.

13

14

15

16

17

18

This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1111/1758-2229.12222 This article is protected by copyright. All rights reserved.

* Correspondence: Prof. Fergal O’Gara, BIOMERIT Research Centre, Department of

20

Microbiology, University College Cork, Ireland. Tel : +353 (0)21 427 2097. Fax : +353 (0)21

21

427 5934. E-mail: [email protected].

22

Summary

23

The existence of microbial communities and the complex interactions that govern their

24

dynamics have received considerable attention in recent years. Advances in genomic

25

sequencing technologies have greatly enhanced our understanding of ‘what is there’.

26

However, the question as to ‘what are they doing’ remains less well defined. The continual

27

development of the genomic and metagenomic sequence databases provides an exciting

28

opportunity to interrogate the distribution and prevalence of key microbial systems across a

29

diverse set of ecosystems. The widely distributed Type Six Secretion System (T6SS) has

30

been shown to play a significant role in bacterial-bacterial and bacterial-host interactions.

31

While several T6SS effectors have been shown to target the cell wall and membrane of

32

competing cells, little is known about the roles these proteins play in different ecosystems.

33

Therefore, the prevalence of a key T6SS effector superfamily known as type six lipase

34

effectors (Tle) was studied in over 2000 metagenomic datasets representing diverse

35

ecosystems and host niches. Increased Tle representation in environmental categories

36

strongly supports the hypothesis of niche specialisation and suggests that these effectors may

37

play important niche-specific roles.

Accepted Article

19

38 39 40 41

This article is protected by copyright. All rights reserved.

Accepted Article

42 43 44 45

Introduction

46

Bacterial interactions underpin community function, and are often dependent on secretion

47

systems. In Gram negative bacteria, six specialised secretion systems have been described –

48

Type I Secretion System-Type VI Secretion System (T1SS-T6SS). The most recently

49

uncovered system, T6SS, has emerged as a highly significant factor of the bacterial

50

interactome (Russell, Peterson, et al., 2014). Several reports suggest a role for T6SS in

51

attachment/biofilm formation, the general stress response, osmotolerance, and maintenance of

52

pH homeostatis by H+ ion secretion (Enos-Berlage et al., 2005; Aschtgen et al., 2008; Weber

53

et al., 2009; Gueguen et al., 2013; Zhang et al., 2013). However, it is most widely considered

54

as a weapon of interbacterial warfare, being involved in virulence towards both prokaryotic

55

and eukaryotic organisms (Filloux, 2013). Of particular interest is the role of T6SS in

56

moderating population dynamics within an ecosystem. The growing awareness of the

57

polymicrobial communities that exist and thrive in most ecological and clinical niches has

58

heightened the interest in microbial factors that can control population dynamics. While

59

experimental models that facilitate the investigation and dissection of polymicrobial

60

interactomes continue to emerge, the explosion in genomic and metagenomic datasets

61

provides an ideal opportunity to gain significant insights into the physiological role of these

62

secretion systems in that context i.e. within communities.

63

Secretion system abundance (including T6SS abundance) in metagenomes has been studied

64

previously (Persson et al., 2009; Barret et al., 2013), and it appears that T6SS is over-

65

represented in niche-specific environments. Importantly however, given the diverse This article is protected by copyright. All rights reserved.

functionality of T6SS, it is not clear to what extent this abundance reflects the importance of

67

T6SS-mediated killing in these environments, a process which is carried out by the action of

68

effector molecules. Several T6SS effector superfamilies have been identified to date: T6S

69

amidase effectors (Tae), T6S glycosidase hydrolase effectors (Tge), T6S lipase effectors

70

(Tle) and T6S DNase effectors (Tde) (Ma et al., 2014; Russell, Peterson, et al., 2014). While

71

Tae and Tge target the bacterial cell wall, Tle are phospholipases which target the cell

72

membrane while Tde targets DNA (Russell et al., 2013). Furthermore, individual families

73

within these superfamilies have specific sites of action due to their own characteristic

74

enzymatic activities. For example, Tle1, Tle2 and Tle5 have PLA2, PLA1 and PLD activity,

75

respectively (Russell et al., 2013).

76

Of these effector superfamilies, Tle appears to be most widespread in genomic sequences,

77

being encoded in a broad spectrum of bacterial species, including a range of emerging and

78

established pathogens (Russell et al., 2013). Moreover, while all T6SS effector superfamilies

79

have been shown to function in interbacterial competition, Tle has also been shown to be

80

involved in virulence towards eukaryotes. Indeed, the first example of a Tle effector secreted

81

in a T6SS-dependent manner was Tle2VC/TseL from Vibrio cholerae, which was found to

82

bind to VgrG and was necessary for efficient killing of amoeba (Dong et al., 2013). Another

83

study identified Tle genes within genomic regions whose disruption resulted in loss of

84

virulence to diverse eukaryotic hosts including macrophage, mice, amoeba, and insects

85

(Waterfield et al., 2008). Two recent publications also demonstrate that Tle sequences can

86

mediate bacterial-eukaryote interactions (Jiang et al., 2014; Lery et al., 2014). Similar to the

87

T6SS itself, multiple tle genes can be encoded in the same genome, and these are usually of

88

distinct phylogenetic origins. For example, five Tle genes are present in Pseudomonas

89

aeruginosa PA14 (Tle1, Tle3, Tle4 and two copies of Tle5). The existence of multiple Tle

90

families and the fact that several can be found within the same genome suggests that each

Accepted Article

66

This article is protected by copyright. All rights reserved.

family may have some degree of specialisation. While the membrane functionality of these

92

effectors within microbial genomes appears to have arisen through convergence, a majority

93

of Tle are encoded on horizontally acquired island regions, linked to vgrG loci (Barret et al.,

94

2011). Though sequence conservation may be limited between families, niche specialisation

95

might provide some insight into the role of these proteins in shaping microbial populations in

96

diverse environments.

97

Therefore, in this study we investigated the distribution and abundance of tle genes in

98

genomic and metagenomic databases. The relatively widespread nature of the T6SS and Tle,

99

makes the Tle a good candidate for a highly focused study on bacterial-mediated killing in

100

different ecosystems, and potentially a good proxy for the level of competition within an

101

environment. Tle distribution was distinct from T6SS distribution and each family of Tle

102

exhibited a distinct spectrum of distribution across the individual ecosystems. Several niche-

103

specific signatures were identified, with a particular emphasis on Tle5 abundance in human

104

samples. Furthermore, changes in conserved Tle residue motifs between niches provide

105

additional evidence of selection, although biochemical and genetic analyses will be required

106

to evaluate their importance. Taken together, these data strongly support a role for Tle

107

effector proteins in moderating microbial community structure in a broad spectrum of

108

ecosystems, marking them as key targets for therapeutic development.

Accepted Article

91

109 110 111 112 113

This article is protected by copyright. All rights reserved.

Accepted Article

114 115 116 117

Results and discussion

118

Uncovering Tle diversity in genomic and metagenomic datasets

119

Tle proteins are emerging as key factors in mediating microbial-microbial and microbial-host

120

interactions, suggesting a role in shaping microbiome structures across a wide spectrum of

121

ecological niches. Analysis of the distribution and abundance of Tle genes in a range of

122

diverse ecosystems would provide important insights into the functionality of this major class

123

of secretion effector and uncover the potential for niche-specialisation. The recent dramatic

124

increase in genome and metagenome sequences being made publically available provided an

125

ideal opportunity to pursue this goal. The IMG database now contains over 2000

126

metagenomes, many of which can be assigned to the broader environmental categories:

127

marine, fresh, human, arthropod, rhizosphere/soil, and engineered. Tle occurrence was

128

determined for each environment. No Pfam or COG domains were both specific and

129

exclusive to the various Tle families. Therefore, to avoid false positive and false negatives,

130

databases were searched for Tle sequences using the program BLASTP on the IMG database.

131

To establish the veracity of this method and generate bait sequences, multiple rounds of

132

BLASTP analysis were performed using genomic datasets on IMG. Results were limited to

133

hits encoded adjacent to a vgrG gene using the gene neighbourhood tool in IMG. After

134

several iterations a list of baits representing the fewest amount of diverse Tle needed to

135

obtain all Tle hits from genomic data was generated (File S1). As this approach is based on

136

already described Tle, any highly divergent members of known Tle, or members of currently

137

unidentified Tle families, would not be detected in this analysis. This article is protected by copyright. All rights reserved.

Previous analysis of secretion system abundance in metagenomes normalised incidence by

139

the number of predicted Proteobacterial genes in the dataset, as the most studied secretion

140

systems were overwhelmingly present in this phylum (Barret et al., 2013). This ensured that

141

secretion system frequency was not simply an artefact of Proteobacterial frequency or the

142

different amounts of sequence data in each metagenome. As genomic analysis of Tle

143

occurrence revealed that Tle are also predominantly encoded in Proteobacterial genomes, this

144

method of normalisation was also employed in this study. As expected, the overall genomic

145

distribution of Tle was similar to the previously reported genomic distribution of T6SS, being

146

predominantly present in Proteobacteria, and also occasionally found in Acidobacteria and

147

Planctomycetes. In absolute terms, 2078 Tle were identified in metagenomes and the

148

numbers of Tle in an environmental category ranged from 31 in soil to 1059 in human. In

149

relative terms the numbers of Tle ranged from approximately 1 per 4000 prokaryote genes in

150

arthropod metagenomes to 1 per 700,000 prokaryote genes in soil metagenomes.

Accepted Article

138

151 152

Environment type determines Tle numbers

153

Perhaps unsurprisingly given the diversity of the environmental categories studied,

154

considerable differences in total Tle frequency were evident between these metagenome sets

155

(Fig. 1).

156

It is particularly notable that the arthropod, rhizosphere and human metagenomes, all host

157

associated niches, contain a large abundance of tle genes, while Tle is relatively infrequent in

158

aquatic environments and in bulk soil. This may simply reflect the fact that Tle are involved

159

in interactions with eukaryotes, or that Tle abundance may reflect the level of

160

activity/competition within these niches. Indeed, previous analysis of aquatic metagenomes

161

found that secretion system genes were much more prevalent in productive waters (Persson et

This article is protected by copyright. All rights reserved.

al., 2009). However, a role for Tle proteins in the marine and soil ecosystems cannot be ruled

163

out. Other factors may also contribute to a lower Tle abundance. The lack of Tle in aquatic

164

environments might be due to the lack of activity within this niche or could possibly reflect

165

that open water niches in aquatic environments are not conducive to a contact-dependent

166

method of bacterial killing. Greater sampling of sediments or sites of high bacterial

167

concentrations from these environments may result in greater Tle representation in aqueous

168

environments, though in the limited amount of sediment-based aquatic metagenomes Tle

169

genes are not more abundant. One caveat to these results is that certain sites can dominate an

170

environmental category. For example, of the freshwater genomes, roughly 45% of the

171

prokaryote genes and 90% of T6SS come from a single site; Wetland microbial communities

172

from Twitchell Island in the Sacramento Delta. Despite this, only 12% of freshwater Tle

173

comes from this source. A closer look at the contribution of the various metagenomes to the

174

Tle frequency in their environmental category reveals interesting observations. Fungal-

175

associated arthropod genomes make up about 5.6% of total arthropod metagenome DNA, but

176

can account for at least 25% of all Tle, suggesting that they may be involved in bacterial-

177

fungal interactions.

178

Within human metagenomes, Proteobacteria are much more common in the oral niche than in

179

the stool (Fig. S1), with Tle frequency being significantly higher in the mouth than in the gut,

180

even allowing for the differential Proteobacterial representation. Indeed, the latter niche

181

contributes less than 20% of the amount of Tle that would be expected based on its

182

Proteobacterial gene count. Whether Tle should be considered overrepresented in the mouth

183

or underrepresented in the gut is a matter of perspective, but these data certainly support a

184

role for Tle proteins in contributing to the bactericidal activity in the oral niche.

Accepted Article

162

185

This article is protected by copyright. All rights reserved.

Tle abundance differs from T6SS abundance

187

The niche-differential abundance of Tle proteins suggests an important and specific role for

188

these effectors in polymicrobial environments. While Tle function would appear to be

189

specific, the role of the T6SS itself is more functionally diverse. Therefore, we considered the

190

possibility that Tle abundance would not necessarily reflect T6SS abundance. To test this

191

hypothesis the correlation between T6SS frequency and Tle frequency in the datasets was

192

examined. T6SS frequency in the various environments was assessed using a previously

193

published method (Barret et al., 2013) which was applied to new metagenomic datasets.

194

As shown in Fig. 2, the abundance of T6SS does not reflect the abundance of Tle, as T6SS

195

numbers are relatively similar across the metagenomes datasets. The human metagenomes

196

have highest amount of T6SS, but T6SS is only 2.4 times more abundant in this environment

197

than in the marine metagenomes where it is least frequent. This is consistent with previous

198

results which show that individual niches could have larger variation in T6SS frequency

199

(Barret et al., 2013), as extremes will have less of an impact when aggregated in an

200

environmental category. A corollary to the disparity between T6SS and Tle abundance is that

201

T6SS does not seem to be enriched by the environments which might be expected to have

202

greater available energy or the presence of a eukaryotic host. For example, the rhizosphere

203

and host-associated arthropod metagenomes have proportionally fewer T6SS per prokaryotic

204

genes than bulk soil and engineered metagenomes respectively, in spite of the fact that the

205

latter two environments would not be expected to contain a higher proportion of eukaryotes.

206

The discrepancy between T6SS and Tle abundance may suggest that T6SS may be playing a

207

role not related to killing in the environments where Tle is underrepresented. Alternatively,

208

killing could be achieved using a different set of effector molecules.

Accepted Article

186

209

This article is protected by copyright. All rights reserved.

Divergent Tle family representation suggests niche-specific specialisation of Tle families

211

As Tle genes are most often present on horizontally transferred vgrG islands they are not

212

necessarily constrained by the general evolution of the bacteria they reside in (Barret et al.,

213

2011). For example, P. aeruginosa PA7 has Tle2 but other P. aeruginosa strains such as

214

PAO1 do not. Does the evolution of five different families of Tle represent some degree of

215

specialisation? As mentioned, Tle families do have unique sites of activity for phospholipid

216

cleavage (Russell et al., 2013). If the evolution of five separate families occurred because

217

each family was in some way specialised, and therefore more useful in certain niches, it

218

follows that various environments would show variation in the frequency of Tle families. If

219

the contrary is true, and no niche is exerting selective pressure to favour particular Tle

220

families, all families should be expected to occur with similar frequencies.

221

While Tle occur with similar frequencies in some niches, other niches show large variations

222

(Table S1 and Fig. 3). Distribution of the various Tle families in arthropod metagenomes is

223

consistent, with only a 3 fold difference between the number of the least common and most

224

common Tle families. However, some of the Tle2 hits from arthropod metagenomes may

225

represent Tle2 effectors which have been co-opted into the insecticidal “Toxin Complex”

226

(Yang et al., 2012). The rhizosphere has a more disparate Tle distribution, with the most

227

abundant family (Tle2) being 10 times more abundant than the least abundant family (Tle4).

228

Though Tle are generally uncommon in aquatic environments, Tle1 and Tle2 are relatively

229

enriched in freshwater and marine datasets, respectively.

230

The most striking variation in Tle family abundance is found within the human metagenomes.

231

Tle2 and Tle3 are completely absent but Tle1 and especially Tle5 are highly abundant.

232

Initially, the lack of Tle2 and Tle3 families from human metagenomes was surprising because

233

they occur in genomes of bacteria which have been isolated from, or have some association

Accepted Article

210

This article is protected by copyright. All rights reserved.

with, humans. Tle2 and Tle3 can be found in known human pathogens, but these may not be

235

common in metagonomes of heathy humans. Another reason for the discrepancy between

236

metagenome data and genome data may be explained by the specific sites which have been

237

sequenced for metagenome analysis. Some human-associated bacteria possessing these Tle

238

families, such as Masilla timonae (Lindquist et al., 2003), were isolated from areas of the

239

body which are not represented in the available metagenome datasets.

240

This raises the possibility that not only are general environments selecting for Tle, but there

241

are also specific selections depending on various niches within these environments. Few of

242

the Tle from human metagenomes are present in the gut, which is largely explained by the

243

paucity of the Tle5 family. This niche contributes less than 8% of the Tle5 numbers it would

244

be expected to based on the levels of Proteobacterial present. In contrast, Tle5 is highly

245

represented in the mouth. As this is a highly specific niche we assessed the divergence of Tle

246

sequences found in this site. It is clear from BLAST analyses that most of the Tle5 in this

247

niche are homologous to Tle from the Haemophilus and Aggregatibacter genera.

248

Experimental evidence will be required to determine whether the frequency of Tle5

249

contributes to the prevalence of these bacteria within the mouth, or whether any other Tle

250

family would serve equally as well.

251

As Tle is likely to be horizontally transferred, the differential abundance of Tle families could

252

also be due to a founder effect. To test this hypothesis, the niche location of Tle within

253

phylogenetic trees of Tle sequences was noted (Fig. S2-S6). Tle phylogeny was not well

254

correlated with niche, except in cases of Tle from highly related strains in the same niche.

255

This is incongruent with a potential founder effect.

256

Furthermore, instances where members of the same species independently acquired members

257

of the same Tle family, suggests a degree of selective pressure to obtain these genes. P.

Accepted Article

234

This article is protected by copyright. All rights reserved.

aeruginosa strains have several Tle genes, but closely related species such as P. resinovorans

259

or P. thermotolerans have few or no Tle, which may be due the fact that P. aeruginosa live in

260

several different niches. Other Tle profiles from genomes are in agreement with the

261

metagenomic data. Tle2 is very frequent in both rhizosphere metagenomes and in soil-

262

dwelling P. fluorescens species, while being much less common in other Pseudomonas

263

species. Tle1 and Tle2 are enriched in aquatic metagenomes and also in Vibrio species.

264

An interesting feature of the analysis was the finding that some metagenomic Tle1 sequences

265

were homologous to lipases found in the phylum Bacteroidetes. In fact, these lipases appear

266

to be genuine effectors of another phage-derived secretion system, similar to the

267

Photorhabdus virulence cassette/anti-feeding island, which was hypothesised to be a

268

divergent T6SS (Yang et al., 2006; Penz et al., 2010; Zhang et al., 2012). This hypothesis

269

was recently confirmed (Russell, Wexler, et al., 2014). As this system is less common than

270

the T6SS in genomic sequences (for example see Persson et al., 2009) and Bacteroidetes are

271

less common than Proteobacteria in metagenomic datasets (data not shown), most Tle1 hits

272

obtained in metagenomes are expected to be T6SS-associated. However, BlastP searches

273

suggest that approximately half of the human-associated Tle1 sequences (roughly 10% of

274

overall Tle sequences from human metagenomes) are from Bacteroidetes, possibly a

275

reflection of the prevalence of Bacteroidetes in the mouth. Conversely, BlastP searches

276

suggest very few of the Tle1 sequences from other metagenomes are from Bacteroidetes.

Accepted Article

258

277 278

Divergence within Tle families

279

During the course of the BLAST analysis of Tle5 it became clear that there were two clades

280

emerging where several sequences were quite divergent from the majority. A maximum

281

likelihood phylogenetic tree revealed a clear divergence between two branches of Tle5

This article is protected by copyright. All rights reserved.

proteins, which are referred to hereinafter as Tle5a (which includes PldA) and Tle5b (which

283

includes PldB from P. aeruginosa and PLD1 from Klebsiella pneumonia) (Fig. 4) (Jiang et

284

al., 2014; Lery et al., 2014). This split between Tle5 sequences is congruent with the original

285

phylogenetic analysis of Tle5 published by Russell and colleagues and also a recent analysis

286

published by Jiang and colleagues (Russell et al., 2013; Jiang et al., 2014). The lack of

287

similarity between these branches is such that members of the Tle5a sub-cluster, which

288

includes the previously characterised PldA/Tle5 from PAO1, are more homologous to

289

phospholipases from eukaryotes than they are to their Tle5b counterparts. Tle5a are much

290

less common both in genome and metagenome datasets, being infrequent in most niches.

291

An examination of the genomic context of Tle5 from genome sequences shows that this

292

divergence is also evident in the adjacently encoded immunity genes. Tle5b genes are

293

encoded next to putative immunity genes with Sel1 domains, and these are relatively

294

conserved, while the less common Tle5a genes are encoded next to more divergent immunity

295

proteins which have little homology to the Sel1-containing proteins. Indeed, Tle5b from K.

296

pneumonia was reported to lack a cognate immunity gene, due to the lack of similarity

297

between the putative immunity gene Tli5b and the previously described Tli5a immunity

298

protein (Lery et al., 2014). Protein structure can be similar in the absence of sequence

299

homology, but Tli5a and Tli5b were predicted to be different structures by modelling tool

300

Phyre2 (Table S2) (Kelley and Sternberg, 2009). In addition, it was recently shown that Tli5a

301

and Tli5b immunity proteins from P. aeruginosa were only activate against Tle5a and Tle5b,

302

respectively, Therefore, it might be useful to consider Tle5 as being composed of the two

303

sub-groups Tle5a and Tle5b (Fig. 4).

Accepted Article

282

304

This article is protected by copyright. All rights reserved.

Sequence analysis reveals conservation of motifs with residue divergence in all families

306

of Tle in diverse ecological niches.

307

In the event that Tle niche specialisation were to occur, selective pressure might be expected

308

to manifest itself in the amino acid sequences of Tles from different environments. Therefore,

309

a comparison of Tle amino acid sequences from available metagenome datasets was

310

undertaken, focusing on the niche-specific conservation of residues. The five Tle lipase

311

families contain either the GxSxG motif or dual HxKxxxD motifs. Several additional

312

conserved motifs exist in each specific family and, in line with our expectations, the areas of

313

greatest conservation were observed in domains predicted to have lipase activity (Fig. 5)

314

(File S2-S10). However, inter-familial alignments of Tle proteins did not identify motifs that

315

were common to all families, apart from the conserved GxSxG motif shared by Tle1-4.

316

The lack of any conserved motifs between Tle families supports the hypothesis that these

317

families have unique features. These unique features may be amenable to selection by

318

specific niches, resulting in the differential Tle family abundance reported above. To

319

investigate this, sequences of the same Tle family from different niches were compared, and

320

analysed for the occurrence of changes in highly conserved amino acid residues. Several

321

niche-specific conservations were identified, including residue 2596 in the Tle5 alignment,

322

where the polar uncharged amino acid Asparagine (N) dominates in sequences from human

323

metagenomes but the positively charged Histidine dominates in sequences from arthropod

324

metagenomes (Fig. 6). More frequent were residues which were quite conserved in sequences

325

from one environment, but not in sequences from another environment (Fig. 6). Due to the

326

partial nature of many metagenome sequences, in many positions there is limited coverage,

327

which may mask other niche-specific residue changes.

Accepted Article

305

This article is protected by copyright. All rights reserved.

Different environments select for certain species, and whether the variation in residues of Tle

329

from different niches was due to genetic drift in these species or some selective pressure

330

driving adaption will remain unclear until further functional and bioinformatic studies are

331

completed. Although our data do not support a role for a founder effect, demonstrating

332

convergence of each polymorphism is currently restricted by limitations in the available

333

metagenomic datasets, where individual species or genera dominate the available genetic

334

information from each niche. While convergent evolution would be the logical outcome of

335

selective pressure at certain residue sites, more sequencing data, and possibly experimental

336

data, are needed before this can be demonstrated. However, these data are congruent with the

337

hypothesis that selective pressure within different environments may manifest in divergent

338

Tle sequences.

Accepted Article

328

339 340

Conclusion

341

Analysis of the distribution of a widespread effector superfamily is a novel and potentially

342

more accurate way of examining levels of T6SS-mediated competition and killing within

343

environments. The differing distribution patterns of the Tle superfamily in various niches is

344

an interesting observation which offers insights into bacterial activity within that niche.

345

Moreover, the differential abundance of Tle families within certain niches as well as the

346

presence of different Tle families within the same genomes suggests there is specialisation of

347

the various Tle families. Indeed, a recent report identified niche specialisation of the twin-

348

arginine translocation system, with different levels of selective pressure on the TatC protein,

349

which were focused on certain positions in the protein, evident in different strains (Simone et

350

al., 2013). This suggests that these phenomena may be more widespread generally among

351

bacterial secretion systems. While the wealth of metagenomic data now available presents an

This article is protected by copyright. All rights reserved.

opportunity to find ways to study ecosystems directly, environmental factors can be diverse

353

even within niches and as sampling levels increase, it will further inform the analysis and

354

conclusions drawn by this study.

Accepted Article

352

355 356 357 358

Acknowledgments

359

The authors would like to thank Dr. Marlies Mooij for valuable discussions.

360

This research was supported in part by grants awarded by the European Commission (FP7-

361

PEOPLE-2013-ITN, 607786;

362

311975; OCEAN 2011-2, 287589; Marie Curie 256596), Science Foundation Ireland (SSPC-

363

2,

364

12/TIDA/B2405; 09/RFP/BMT2350),

365

(FIRM/RSF/CoFoRD; FIRM 08/RDC/629), the Irish Research Council for Science,

366

Engineering and Technology (PD/2011/2414; RS/2010/2413), the Health Research Board

367

(HRA/2009/146), the Environmental Protection Agency (EPA2008-PhD-S-2), the Marine

368

Institute (Beaufort award C2CRA 2007/082) Teagasc (Walsh Fellowship 2013) and the

369

Higher Education Authority of Ireland (PRTLI4).

12/RC/2275;

FP7-KBBE-2012-6, CP-TP-312184; FP7-KBBE-2012-6,

07/IN.1/B948; the

13/TIDA/B2625; Department

370 371 372 373

This article is protected by copyright. All rights reserved.

of

12/TIDA/B2411;

Agriculture

and

Food

Accepted Article

374 375

References

376 377 378 379

Aschtgen, M.-S., Bernard, C.S., Bentzmann, S.D., Lloubès, R., and Cascales, E. (2008) SciN Is an Outer Membrane Lipoprotein Required for Type VI Secretion in Enteroaggregative Escherichia coli. J. Bacteriol. 190: 7523–7531.

380 381 382 383

Barret, M., Egan, F., Fargier, E., Morrissey, J.P., and O’Gara, F. (2011) Genomic analysis of the type VI secretion systems in Pseudomonas spp.: novel clusters and putative effectors uncovered. Microbiology 157: 1726–1739.

384 385 386 387 388

Barret, M., Egan, F., and O’Gara, F. (2013) Distribution and diversity of bacterial secretion systems across metagenomic datasets. Environ. Microbiol. Rep. 5: 117–126. Crooks, G.E., Hon, G., Chandonia, J.-M., and Brenner, S.E. (2004) WebLogo: A Sequence Logo Generator. Genome Res. 14: 1188–1190.

389 390 391 392

Dong, T.G., Ho, B.T., Yoder-Himes, D.R., and Mekalanos, J.J. (2013) Identification of T6SS-dependent effector and immunity proteins by Tn-seq in Vibrio cholerae. Proc. Natl. Acad. Sci. 110: 2623–2628.

393 394 395 396

Enos-Berlage, J.L., Guvener, Z.T., Keenan, C.E., and McCarter, L.L. (2005) Genetic determinants of biofilm development of opaque and translucent Vibrio parahaemolyticus. Mol. Microbiol. 55: 1160–1182.

397 398

Filloux, A. (2013) The rise of the Type VI secretion system. F1000Prime Rep. 5:

399 400 401 402 403

Gueguen, E., Durand, E., Zhang, X.Y., d’ Amalric, Q., Journet, L., and Cascales, E. (2013) Expression of a Yersinia pseudotuberculosis Type VI Secretion System Is Responsive to Envelope Stresses through the OmpR Transcriptional Activator. PLoS ONE 8: e66615.

404 405 406

Huang, Y., Niu, B., Gao, Y., Fu, L., and Li, W. (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26: 680–682.

407 408 409 410

Jiang, F., Waterfield, N.R., Yang, J., Yang, G., and Jin, Q. (2014) A Pseudomonas aeruginosa Type VI Secretion Phospholipase D Effector Targets Both Prokaryotic and Eukaryotic Cells. Cell Host Microbe 15: 600–610.

411 412 413

Kelley, L.A. and Sternberg, M.J.E. (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nat. Protoc. 4: 363–371.

This article is protected by copyright. All rights reserved.

Lery, L.M., Frangeul, L., Tomas, A., Passet, V., Almeida, A.S., Bialek-Davenet, S., et al. (2014) Comparative analysis of Klebsiella pneumoniae genomes identifies a phospholipase D family protein as a novel virulence factor. BMC Biol. 12: 41. Lindquist, D., Murrill, D., Burran, W.P., Winans, G., Janda, J.M., and Probert, W. (2003) Characteristics of Massilia timonae and Massilia timonae-Like Isolates from Human Patients, with an Emended Description of the Species. J. Clin. Microbiol. 41: 192– 196.

422 423 424 425

Ma, L.-S., Hachani, A., Lin, J.-S., Filloux, A., and Lai, E.-M. (2014) Agrobacterium tumefaciens Deploys a Superfamily of Type VI Secretion DNase Effectors as Weapons for Interbacterial Competition In Planta. Cell Host Microbe 16: 94–104.

426 427 428 429

Penz, T., Horn, M., and Schmitz-Esser, S. (2010) The genome of the amoeba symbiont “Candidatus Amoebophilus asiaticus” encodes an afp-like prophage possibly used for protein secretion. Virulence 1: 541–545.

430 431 432 433

Persson, O.P., Pinhassi, J., Riemann, L., Marklund, B.-I., Rhen, M., Normark, S., et al. (2009) High abundance of virulence gene homologues in marine bacteria. Environ. Microbiol. 11: 1348–1357.

434 435 436 437

Russell, A.B., LeRoux, M., Hathazi, K., Agnello, D.M., Ishikawa, T., Wiggins, P.A., et al. (2013) Diverse type VI secretion phospholipases are functionally plastic antibacterial effectors. Nature 496: 508–512.

438 439 440

Russell, A.B., Peterson, S.B., and Mougous, J.D. (2014) Type VI secretion system effectors: poisons with a purpose. Nat. Rev. Microbiol. 12: 137–148.

441 442 443 444

Russell, A.B., Wexler, A.G., Harding, B.N., Whitney, J.C., Bohn, A.J., Goo, Y.A., et al. (2014) A Type VI Secretion-Related Pathway in Bacteroidetes Mediates Interbacterial Antagonism. Cell Host Microbe 16: 227–236.

445 446 447 448

Simone, D., Bay, D.C., Leach, T., and Turner, R.J. (2013) Diversity and Evolution of Bacterial Twin Arginine Translocase Protein, TatC, Reveals a Protein Secretion System That Is Evolving to Fit Its Environmental Niche. PLoS ONE 8: e78742.

449 450 451 452

Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., and Kumar, S. (2011) MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol. Biol. Evol. 28: 2731–2739.

453 454 455 456 457

Waterfield, N.R., Sanchez-Contreras, M., Eleftherianos, I., Dowling, A., Yang, G., Wilkinson, P., et al. (2008) Rapid Virulence Annotation (RVA): Identification of virulence factors using a bacterial genome library and multiple invertebrate hosts. Proc. Natl. Acad. Sci. 105: 15967–15972.

Accepted Article

414 415 416 417 418 419 420 421

This article is protected by copyright. All rights reserved.

Weber, B., Hasic, M., Chen, C., Wai, S.N., and Milton, D.L. (2009) Type VI secretion modulates quorum sensing and stress response in Vibrio anguillarum. Environ. Microbiol. 11: 3018–3028. Yang, G., Dowling, A.J., Gerike, U., ffrench-Constant, R.H., and Waterfield, N.R. (2006) Photorhabdus Virulence Cassettes Confer Injectable Insecticidal Activity against the Wax Moth. J. Bacteriol. 188: 2254–2261.

465 466 467 468

Yang, G., Hernández-Rodríguez, C.S., Beeton, M.L., Wilkinson, P., ffrench-Constant, R.H., and Waterfield, N.R. (2012) Pdl1 Is a Putative Lipase that Enhances Photorhabdus Toxin Complex Secretion. PLoS Pathog 8: e1002692.

469 470 471 472 473

Zhang, D., Souza, R.F. de, Anantharaman, V., Iyer, L.M., and Aravind, L. (2012) Polymorphic toxin systems: Comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomics. Biol. Direct 7: 1–76.

474 475 476 477

Zhang, W., Wang, Y., Song, Y., Wang, T., Xu, S., Peng, Z., et al. (2013) A type VI secretion system regulated by OmpR in Yersinia pseudotuberculosis functions to maintain intracellular pH homeostasis. Environ. Microbiol. 15: 557–569.

Accepted Article

458 459 460 461 462 463 464

478 479 480 481 482 483 484 485 486 487 488

This article is protected by copyright. All rights reserved.

Accepted Article

489 490

Table and figure legends

491

Figure 1. Total Tle abundance (Tle/proteobacterial gene count) in environmental datasets

492

shows variation, with high frequency of Tle correlated with host-associated metagenomes.

493

Figure 2. (A) T6SS abundance in environmental datasets was calculated similarly to Tle

494

abundance (T6SS gene COG3516/proteobacterial gene count). (B) Comparison of T6SS and

495

Tle frequency shows that T6SS and Tle abundance are dissimilar. Due to the limited variation

496

in T6SS numbers, Tle genes per T6SS are highest in host-associated metagenomes, mirroring

497

the overall Tle distribution.

498

Figure 3. The frequency of Tle families in metagenomic data from various environmental

499

categories shows marked variation in Tle family distribution in several areas. The highest

500

numbers of three families occur in arthropods, while Tle2 and Tle5 are most common in the

501

rhizosphere and in the human microbiome. Data is again displayed as Tle hits per

502

proteobacterial genes; Tle family abundance across all metagenomes, weighted by

503

proteobacterial abundance per contributing environment, reflects aggregate length of bars per

504

each Tle family.

505

Figure 4. Maximum likelihood trees of Tle5 (A) and Tli5 (B) proteins from genome

506

sequences were generated with MEGA5 using the WAG amino acid substitution model with

507

1000 bootstrap replicates (displayed as percentages) (Tamura et al., 2011). The Tle5a

508

proteins and the corresponding immunity genes are coloured in red, and group separately to

509

the Tle5b and Tli5b proteins. As Tle5b proteins are much more common than Tle5a proteins,

510

only representatives of Tle5b were used in the tree.

511

Figure 5. Weblogo images showing consensus motifs of Tle5 protein sequences. Upper logos

512

represent sequences from arthropod metagenomes, while the lower logos represent sequences This article is protected by copyright. All rights reserved.

from human metagenomes. Genomic sequences for each Tle were aligned using MAFFT E-

514

INS-I using the BLOSSUM62 substitution matrix (Kato et al., 2013) and metagenomic

515

sequences were added to the genomic alignment using the add fragments option on the

516

MAFFT server. Consensus sequences were derived using WebLogo 3 (Crooks et al., 2004),

517

after generation of a non-redundant set of sequences using CD-HIT to eliminate sequences

518

showing 100% identity (Huang et al., 2010).

519

Figure 6. Consensus sequences for 2 regions in (A) Tle2 and (B) Tle5 show differences in

520

conserved residues from sequences found in different environments. For Tle2 sequences from

521

arthropod metagenomes (top) are compared with sequences from rhizosphere metagenomes

522

(bottom). For Tle 5 sequences from arthropod metagenomes (top) are compared with

523

sequences from human metagenomes

524

Table S1. Tle and Proteobacterial gene count in assembled metagenomes from different

525

environments. Predicted Proteobacterial genes in metagenomes were calculated using the

526

metagenomes vs genomes tool on IMG with 60% identity used as a cut-off.

527

Table S2. Tle and Tli amino acid sequences from P. aeruginosa were analysed using Phyre2

528

(Kelley and Sternberg, 2009). The best Phyre2 hit is shown for each sequence except for

529

Tli5b, where the next best hit is also shown as it is another high confidence hit and it is a

530

T6SS-associated result. Interestingly, both Tli4 and Tli5a are modelled with high confidence

531

to the Mog1/PspB structure which is strongly suggested to be associated with T6SS by gene

532

neighbourhood linking. Currently these proteins have no known role in T6SS, but are

533

hypothesised to be adapters between the T6SS structure and effectors (Zhang et al., 2012).

534

The confidence in the model, the percentage of identities shared between query sequence and

535

template, and the percentage length of the query sequence which is modelled are displayed.

536

As expected, all Tle sequences were best matched to other lipases.

Accepted Article

513

This article is protected by copyright. All rights reserved.

Fig. S1. Proteobacterial composition of human-associated metagenomes. Predicted

538

Proteobacterial genes in metagenomes were calculated using the metagenomes vs genomes

539

tool on IMG with 60% identity used as a cut-off.

540

Fig. S2-S6. Bootstrap consensus maximum likelihood trees of Tle families, noting the habitat

541

from which the bacterium was isolated: human (H), fresh water (F), marine (M), plant (P) and

542

arthropod (A). The similarity of Tle sequences from different environments argues against

543

Tle family distribution in metagenomes being due to a founder effect. Proteins from genome

544

sequences were generated with MEGA5 using the WAG amino acid substitution model with

545

500 bootstrap replicates (Tamura et al., 2011).

546

File S1. Bait sequences for BLAST analyses of metagenomes.

547

File. S2 Consensus sequences of Tle1 from arthropod metagenomes

548

File. S3 Consensus sequences of Tle2 from arthropod metagenomes

549

File. S4 Consensus sequences of Tle3 from arthropod metagenomes

550

File. S5 Consensus sequences of Tle4 from arthropod metagenomes

551

File. S6 Consensus sequences of Tle5 from arthropod metagenomes

552

File. S7 Consensus sequences of Tle2 from human metagenomes

553

File. S8 Consensus sequences of Tle4 from human metagenomes

554

File. S9 Consensus sequences of Tle5 from human metagenomes

555

File. S10. Consensus sequences of Tle2 from rhizosphere metagenomes

Accepted Article

537

This article is protected by copyright. All rights reserved.

Accepted Article

EMI4_12222_F1.tiff

Accepted Article

EMI4_12222_F2.tiff

Accepted Article

EMI4_12222_F3.tiff

Accepted Article EMI4_12222_F4.tiff

Accepted Article

EMI4_12222_F5.tiff

Accepted Article

EMI4_12222_F6.tiff

Tle distribution and diversity in metagenomic datasets reveal niche specialization.

The existence of microbial communities and the complex interactions that govern their dynamics have received considerable attention in recent years. A...
948KB Sizes 0 Downloads 7 Views