G Model

ARTICLE IN PRESS

BIOTEC 6655 1–12

Journal of Biotechnology xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Journal of Biotechnology journal homepage: www.elsevier.com/locate/jbiotec

Characterization of new bacterial catabolic genes and mobile genetic elements by high throughput genetic screening of a soil metagenomic library

1

2

3

4

Q1

5 6

Samuel Jacquiod a,∗ , Sandrine Demanèche a , Luka Ausec b , Zhuofei Xu c , Tom O. Delmont a , Vincent Dunon d , Christine Cagnon e , Ines Mandic-Mulec b , Timothy M. Vogel a , Pascal Simonet a , Laure Franqueville a a Environmental Microbial Genomics Group, Laboratoire Ampère, CNRS, École Centrale de Lyon, Université de Lyon, 36 Avenue Guy de Collongue, 69134 Ecully, France b Department for Food Science and Technology Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia c Molecular Microbial Ecology Group, Section of Microbiology, København Universitet, København, Denmark d Division of Soil and Water Management, Department of Earth and Environmental Sciences, University of Leuven, Kasteelpark Arenberg 20, B-3001 Heverlee, Belgium e Équipe Environnement et Microbiologie, IBEAS – UFR Sciences et Techniques, Université de Pau et des Pays de l’Adour, 64013 Pau, France

7 8 9 10 11 12 13 14

a r t i c l e

15 30

i n f o

a b s t r a c t

16

Article history: Received 6 December 2013 Received in revised form 24 March 2014 Accepted 28 March 2014 Available online xxx

17 18 19 20 21 22

29

Keywords: Soil Bacteria Metagenome Fosmid library Mobile genetic elements Catabolic genes

31

1. Introduction

23 24 Q3 25 26 27 28

32 33 34 35 36 37 38 39 40

A mix of oligonucleotide probes was used to hybridize soil metagenomic DNA from a fosmid clone library spotted on high density membranes. The pooled radio-labeled probes were designed to target genes encoding glycoside hydrolases GH18, dehalogenases, bacterial laccases and mobile genetic elements (integrases from integrons and insertion sequences). Positive hybridizing spots were affiliated to the corresponding clones in the library and the metagenomic inserts were sequenced. After assembly and annotation, new coding DNA sequences related to genes of interest were identified with low protein similarity against the closest hits in databases. This work highlights the sensitivity of DNA/DNA hybridization techniques as an effective and complementary way to recover novel genes from large metagenomic clone libraries. This study also supports that some of the identified catabolic genes might be associated with horizontal transfer events. © 2014 Published by Elsevier B.V.

Soils have arguably the highest diversity of microbial life on Earth and are, therefore, a rich but still poorly explored reservoir of genetic resources with considerable potential for downstream industrial applications (Mocali and Benedetti, 2010). Metagenomic approaches are used to explore these resources through deep sequencing and cloning techniques (Vogel et al., 2009; Lefevre et al., 2008). Despite the noticeable progress in the field of shotgun sequencing (Foerstner et al., 2008) and PCR amplicon sequencing (Cretoiu et al., 2012), the screening of metagenomic DNA clone

∗ Corresponding author. Current address: Molecular Microbial Ecology group, SecQ2 tion of Microbiology, København Universitet, København, Denmark. Tel.: +45 70284746. E-mail addresses: [email protected], [email protected] (S. Jacquiod).

libraries is still considered as one of the most appropriate and efficient strategy for the discovery of new biocatalytic molecules (Leis et al., 2013). Several vectors are available for library building and the choice of the vector depends in part on the genetic targets and final application. Plasmids are used for construction of short insert (50×, indicating a high assembly quality and enough sequencing depth. Interestingly, contigs displaying actual hybridization hits tend to have clearly larger sizes (30–40 kb) and deep coverage (50–150×). 3.2. Taxonomical affiliation of contigs The taxonomical affiliation of the library contigs (Fig. 2) established on the basis of oligonucleotide k-mer frequencies (Patil et al., 2012) has plurality of Actinobacteria related sequences (45/106) that are mostly dominated by the orders of Coriobacteriales (37/106) and Actinomycetales (8/106). Sequences related to unclassified bacteria were the second most dominant group (19/106). Planctomycetes (11/106), Verrucomicrobia (9/106) and Chloroflexi (8/106) were distributed evenly. Proteobacteria (10/106) were mostly dominated by Alphaproteobacteria (6/106) and Deltaproteobacteria (2/106). Lowest abundances were observed for Acidobacteria

Please cite this article in press as: Jacquiod, S., et al., Characterization of new bacterial catabolic genes and mobile genetic elements by high throughput genetic screening of a soil metagenomic library. J. Biotechnol. (2014), http://dx.doi.org/10.1016/j.jbiotec.2014.03.036

354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378

379

380 381 382 383 384 385 386 387 388 389

G Model BIOTEC 6655 1–12 6 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406

407

408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451

ARTICLE IN PRESS S. Jacquiod et al. / Journal of Biotechnology xxx (2014) xxx–xxx

(3/106) and the unique detection of a single Archaea contig related to Sulfolobaceae (Crenarchaeota). Affiliation of the 42 contigs with identified hybridization hits had a somewhat similar pattern, with clear dominance of Actinobacteria related sequences (20 Coriobacteriales and 3 Actinomycetales). Other groups were still represented, including 5 Planctomycetes, 4 Verrucomicrobia, 2 Chloroflexi and 2 sequences affiliated to Deltaproteobacteria and Acidobacteria. Only 2 sequences related to Unclassified Bacteria were observed. The trimmed and untrimmed taxonomical affiliation profiles of contigs generated from the shotgun dataset with the same technique are very similar, but show a totally different picture when compared to the library conditions. Profiles are dominated by Unclassified Bacterial sequences, Verrucomicrobia and Planctomycetes in contrast to the library. Some groups of the 454 shotgun profiles were observed at low frequencies like Actinobacteria and Chloroflexi when compared to the library. No Acidobacteria and Archaea related sequences were found in shotgun contigs.

3.3. Identification and annotation of targeted CDS Up to 3218 coding DNA sequences (CDS) were found in the 6 reading frames. The majority of detected CDS were annotated as unidentified proteins (2636/3218, 81%) not classified in SEED subsystems, among which 70% (1852/2636) are referenced as unclassified “hypothetical” sequences, and 30% (784/2636) referenced as non-classified sequences with actual hits in the database. The remaining 19% of the CDS (582/218) could be associated to one of the SEED subsystems. Whether or not the CDS could fit into the SEED classification, the majority of the proteins identified through this approach (1852/3218, 57.5%) were referenced as “hypothetical” meaning that they have no relevant hit in the current databases. The integrative sequence analysis performed on the contigs yielded a total 94 CDS of interest corresponding to screened genes (Table 2, Table S3 and Table S4) and located on 42 different contigs (Figure 3 from File S6). These hits were identified through the combinatory use of BLASTn, BLASTx, pHMM and ClustalX alignment in order to locate putative hybridization sites, and complemented with RAST outputs in order to determine if any of these sites were located in predicted coding regions with coherent annotations. In parallel, as most proteins predicted by RAST remained without any annotation, detection of protein domain with Pfams where performed in order to systematically verify and reinforce the actual RAST annotations, and also detect possible features omitted by RAST (Fig. 1, number 7). Similarities between genes and their respective probes were determined to be around 50–65% for lowest matches (glycosyl hydrolases related to chitin degradation, dehalogenases and the laccase) and up to 80–95% for highest ones (integron integrases). Among the 94 CDS, 76 mobile genetic elements were detected, including 50 IS/transposases (TRP0154, 4 of them: TRP07-08 and 38–39 were discarded after contig trimming, Table S4), 22 integrases (INT05-26, including 14 phagerelated integrases, Table S3) and 4 integron integrase (INT01-04, Table 2). Four glycosyl hydrolase CDS involved in chitin and Nacetylglucosamine (NAG) degradation were found (CHI01, 02, 03 and 07) as well as 4 CDS related to NAG deacetylase (DEA01-04). Eight CDS belonging to the HaloAcid Dehalogenase superfamily (HAD) were found (HAD01, 03-05, 07-10), as well as 2 neighbor multicopper oxidase CDS (LAC01-02), which are probably a truncated 2-domain-laccase. Sixty-eight percents of the CDS (64/94) have significant matches against experimentally characterized proteins in databases. Co-occurrence of mobile genetic elements on the same contig with other screened catabolic genes was observed 5 times including a beta-hexosaminidase GH20 (CHI02), 3 haloacid dehalogenases (HAD05-07-09) and the laccase

(File S8). Co-occurrence of IS/transposase and integrase CDS was also observed on 9 contigs.

3.4. Mobile genetic elements hits The 76 CDS related to mobile genetic elements were identified on 34 different contigs, with 16 of them harboring more than 2 CDS (up to 9). As they are representing the majority of hybridization hits, the taxonomic affiliation of the 34 contigs harboring MGE is almost identical to the one described above in Fig. 2. Mobile genetic element CDS are embedded into 12 Coriobacteriaceae, 5 Actinomycetales, 5 Planctomycetes, 4 Verrucomicrobia, 1 Chloroflexi, 1 Deltaproteobacteria, an Acidobacteria and 2 unclassified Bacterial contigs (Fig. 2). Based on best BLASTp results against UniProtKB, the identity percentage of integrases CDS varies between 28–100%, and between 36–83% for IS/transposase hits. Integron integrase protein hits were all predicted to be located in the cytoplasm, without any signal peptide (Table 3). All the integrase CDS (except INT05 and 08) are matching proteins in CharProtDB (Table S3), while only half of the IS/transposases CDS (26/50) have actual characterized hits (Table S4). In accordance to the taxonomical origin of probes, the 4 integron integrase CDS are showing nucleic acid similarities with uncultured bacterium integron integrases, closely related to Proteobacteria (e.g. Gammaproteobacteria and Betaproteobacteria). For the IS/transposases CDS, only 18/50 sequence are sharing nucleic acid similarities with Proteobacteria related plasmids, mostly Alphaproteobacteria. In 9 cases, IS/transposases and integrases were found on the same contigs (Table S3 and Table S4), including contig#1 (Actinomycetales, INT05 + TRP02), contig#37 (Coriobacteriaceae, INT11 + TRP13-20), contig#39 (Coriobacteriaceae, INT12 + TRP20), contig#41 (Coriobacteriaceae, INT13 + TRP21-25), contig#48 (Coriobacteriaceae, INT14-15 + TRP29-30), contig#54 with dehalogenase HAD05 (Verrucomicrobia, INT16 + TRP32), contig#59 with LAC01 and 02 (Coriobacteriaceae, INT18 + TRP33-34), contig#82 (Chloroflexi, INT21-23 + TRP43-47), contig#104 (Verrucomicrobia, INT24-25 + TRP53-54).

3.5. Chitin degrading enzymes hits No convincing chitinase hits were found in this study. However, 8 CDS related to glycosyl hydrolases and deacetylases involved in NAG degradation were obtained on 5 different contigs affiliated to Actinobacteria (CHI01-03 and CHI07; DEA01-04, Table 2). CHI CDS are displaying protein identities ranging between 44 and 64%, while DEA CDS are ranging between 31 and 50%. CHI03 and DEA02 did not show any similarities with characterized protein. CHI01 and CHI02 are both beta-hexosaminidase (EC 3.2.1.52) from glycosyl hydrolase family GH3 and GH20, while CHI03 and 07 are respectively affiliated to a sugar binding protein from GH2 family and a lysozyme from GH25 family. DEA01 and 02 are corresponding to polysaccharide deacetylase from CE4 family, while DEA03 and 04 are affiliated to NAG-6 phosphate deacetylases (EC 3.5.1.25). The contig#24 carrying CHI02 and DEA03-04 is affiliated to actinomycetales, and has 3 transposases hits (MGE14, 15 and 16, File S8). Based on amino acid sequence analysis, CHI01 and 03 are both predicted to be cytoplasmic proteins, with the presence of a signal peptide identified for CHI03. CHI02 and 07 have both signal peptides, and seem to be located in periplasm (Gram-negative) or membranes/extracellular (Gram-positive) (Table 3). DEA01 and 04 are predicted to be cytoplasmic proteins, while DEA02 and 03 seem to be more related to the periplasm compartment (Gram-negative) or membrane/extracellular compartment (Gram-positive). Only DEA02 has an identified signal peptide, indicating probable exportation.

Please cite this article in press as: Jacquiod, S., et al., Characterization of new bacterial catabolic genes and mobile genetic elements by high throughput genetic screening of a soil metagenomic library. J. Biotechnol. (2014), http://dx.doi.org/10.1016/j.jbiotec.2014.03.036

452 453

454

455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485

486

487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511

G Model

BIOTEC 6655 1–12

Contig

K-mer affiliation

Pfam and TIGRfam protein domains

Protein family

CHI01 CHI02(a) CHI03

21 24 30

Coriobacteriaceae Actinomycetales Coriobacteriaceae

GH3 N-terminal domain GH20 catalytic domain GH42M + GH2 sugar binding (NS)

CHI07 DEA01

23 17

Coriobacteriaceae Coriobacteriaceae

GH25 family domain Polysaccharide deacetylase

DEA02

21

Coriobacteriaceae

Polysaccharide deacetylase

DEA03(a)

24

Actinomycetales

Amidohydrolase domain 4

DEA04(a)

24

Actinomycetales

Amidohydrolase domain 4

HAD01

12

Coriobacteriaceae

HAD2 hydrolase-like domain

HAD03

17

Coriobacteriaceae

HAD2 domain (NS)

Beta-hexosaminidase, GH3 family Beta-hexosaminidase, GH20 family Probable glycosyl hydrolase family 2, sugar binding Lysozyme, GH25 family Polysaccharide deacetylase, CE4 superfamily Probable polysaccharide deacetylase, CE4 superfamily NAG-6-phosphate deacetylase (EC 3.5.1.25) NAG-6-phosphate deacetylase (EC 3.5.1.25) Haloacid dehalogenase, HAD superfamily Probable hydrolase, HAD superfamilly (subfamily IIIC) Haloacid dehalogenase, HAD superfamily Haloacid dehalogenase, HAD superfamily (subfamily IA) E1–E2 Cation transporting ATPase with HAD domain Haloacid dehalogenase, HAD superfamily E1–E2 Cation transporting ATPase with HAD domain Probable haloacid dehalogenase, HAD superfamily Probable truncated 2-D laccase Integron integrase family Integron integrase family Integron integrase family Integron integrase family

HAD04

53

Coriobacteriaceae

HAD2 hydrolase-like domain

(b)

54

Verrucomicrobia

HAD2 hydrolase-like domain

HAD07(c)

34

Coriobacteriaceae

E1–E2 ATPase + HAD

HAD08

35

Coriobacteriaceae

HAD hydrolase-like domain

HAD09(d)

44

Acidobacteria

E1–E2 ATPase + HAD

HAD10

89

Chloroflexi

HAD hydrolase-like domain

LAC01(e) LAC02(e) INT01 INT02 INT03 INT04

59 59 11 86 100 96

Coriobacteriaceae Coriobacteriaceae Coriobacteriaceae Unclassified bacteria Coriobacteriaceae Planctomycetaceae

Multicopper oxidase domain 3 Multicopper oxidase domain 2 TIGR02249 integron-integrase TIGR02249 integron-integrase TIGR02249 integron-integrase TIGR02249 integron-integrase

HAD05

BLASTp UniProtKBBLASTp UniProtKB

BLASTp CharProtDB

E-value

Id.

AcNum

E-value

Id.

AcNum

5.00E−168 2.00E−147 0.0

64% 44% 45%

B4CZU1 F8ENJ2 Q01TX2

4.40E−31 3.60E−68 –

48% 53% –

CH002672 CH018048 –

1.00E−85 2.00E−40

50% 31%

B4CWL0 M6XTS9

1.40E−08 1.80E−04

41% 45%

CH122306 CH021854

2.00E−113

42%

B4D1X8







3.00E−56

50%

S4XTP4

2.90E−27

49%

CH020312

2.00E−20

44%

D8IWY4

8.70E−03

50%

CH020312

5.00E−97

70%

L0D7M8

5.60E−06

42%

CH017701

0.0

57%

B9XSK8







3.00E−85

55%

K9RLL0

9.80E−06

46%

CH000505

1.00E−13

67%

A7H6U2

1.50E−04

57%

CH021559

0.0

61%

M4NE16

8.40E−179

62%

CH024556

8.00E−62

59%

B4D0V1

1.10E−23

59%

CH024795

0.0

75%

I4C4T5

1.90E−188

68%

CH007140

9.00E−69

46%

U7GQ08







0.0

65%

B4CZ74

3.70E−17

50%

CH122203

1.00E−132 9.00E−30 2.00E−140 9.00E−134

63% 86% 65% 65%

C3U0K3 U5DL55 C3U0K3 C3U0K3

5.70E−12 2.30E−13 1.30E−18 3.50E−11

66% 77% 72% 68%

CH004958 CH004958 CH004958 CH 004958

ARTICLE IN PRESS

CDS

S. Jacquiod et al. / Journal of Biotechnology xxx (2014) xxx–xxx

Please cite this article in press as: Jacquiod, S., et al., Characterization of new bacterial catabolic genes and mobile genetic elements by high throughput genetic screening of a soil metagenomic library. J. Biotechnol. (2014), http://dx.doi.org/10.1016/j.jbiotec.2014.03.036

Table 2 Summary table of all coding DNA sequences (CDS) retrieved from the hybridization screening. Each category of screened genes is represented with their respective protein domains and BLASTp hits against protein databases (E-value, percentage of identity, and access number of protein hits). For clarity sake, a full list of mobile genetic element CDS is provided in supporting information Table S3 for integrases, and Table S4 for IS/transposases. CDS labeled with letters in brackets were found on contigs with mobile genetic elements.

7

G Model

ARTICLE IN PRESS

BIOTEC 6655 1–12

S. Jacquiod et al. / Journal of Biotechnology xxx (2014) xxx–xxx

4.22 3.27 2.97/1.67 2.38/2.23 4.04 3.64 2.30/1.31 4.58 3.73 4.70 4.93 3.78 4.37 4.56 4.82 2.03/1.45 3.04 3.59 4.67 2.64 4.66 4.27

Reliability index Predicted sub-cellular location

Cytoplasm Membrane Cytoplasm/membrane Extracellular/membrane Cytoplasm Membrane Extracellular/membrane Cytoplasm Cytoplasm Cytoplasm Cytoplasm Cytoplasm Membrane Cytoplasm Membrane Cytoplasmic/extracellular Cytoplasm Cytoplasm Cytoplasm Cytoplasm Cytoplasm Cytoplasm − + + + − + − − − − − − − − − + + − − − − −

Signal peptide

4.86 2.20/1.33 2.82/1.01 2.12/1.41 4.29 4.12 1.70/1.58 4.54 3.91 4.28 4.76 3.66 3.88 4.41 4.75 3.85 3.22 3.49 4.67 2.64 4.66 4.72 Cytoplasm Periplasm/inner membrane Cytoplasm/inner membrane Periplasm/extracellular Cytoplasm Periplasm Periplasm/cytoplasm Cytoplasm Cytoplasm Cytoplasm Cytoplasm Cytoplasm Inner membrane Cytoplasmic Inner membrane Periplasm Periplasm Periplasm Cytoplasm Cytoplasm Cytoplasm Cytoplasm Integron integrase

Laccase

Dehalogenases from HAD superfamily

Chitin and NAG degradation

CHI01 CHI02 CHI03 CHI07 DEA01 DEA02 DEA03 DEA04 HAD01 HAD03 HAD04 HAD05 HAD07 HAD08 HAD09 HAD10 LAC01 LAC02 INT01 INT02 INT03 INT04

42.48 76.83 100.07 31.24 33.09 47.50 11.50 25.41 23.8 65.03 27.01 15.84 90.13 21.90 70.98 36.90 25.16 26.49 38.09 11.19 36.74 34.42

0 0 0 0 0 1 0 0 0 0 0 0 9 0 7 0 0 0 0 0 0 0

− + + + − + − − − − − − − − − + + − − − − −

Reliability index

Gam-positive analysis

3.6. Dehalogenases hits

Predicted sub-cellular location Signal peptide

Gram-negative analysis Predicted helices Size (kDa) CDS Category

Table 3 Protein prediction analysis of metagenomic hits. The table shows protein analysis for the 4 glycosyl hydrolases, the 4 deacetylases, the 8 dehalogenases, the 2 multicopper oxidases and the 4 integron integrases. Predicted sizes were estimated according to Stothard (2000). Signal peptide predictions were evaluated for both Gram-positive and Gram-negative types with the SignalP 4.0 tool (Petersen et al., 2011) as well as the presence of transmembrane helices (Käll et al., 2004). Subcellular location prediction was achieved with CELLO tool (Yu et al., 2006) for both Gram-negative and Gram-positive bacterial types (0 < reliability index < 5).

8

8 CDS related to haloacid dehalogenase superfamily HAD have been obtained on 8 different contigs affiliated to Actinobacteria, Verrucomicrobia, Acidobacteria and Chloroflexi (HAD01, 03-05, 0710, Table 2), with protein sequence identities between 46 and 75%. Except HAD02 and 10, all dehalogenases share similarities with characterized proteins. HAD07 and 09 are multiple domain proteins related to known cation-transporting ATPase with HAD-hydrolase domains. According to the probe sequences used, HAD hits are sharing nucleic acid similarities with Proteobacteria dehalogenases (e.g. Deltaproteobacteria for HAD07 and 10; Alphaproteobacteria for HAD03 and 09; Betaproteobacteria for HAD10) and also Actinobacteria for HAD04. However, HAD01 was found to be related to Planctomyces, while HAD05 and 08 do not have any clear affiliation match at the nucleic acid level. HAD05, 07 and 09 were found to co-occur with MGE on the same contigs, respectively contig#34 affiliated to Coriobacteriaceae (HAD07 and phage integrase INT10), contig#44 affiliated to Acidobacteria (HAD09 and transposase TRP26) and contig#54 affiliated to Verrucomicrobia (HAD05, integrase INT16 and transposases TRP26-27). Protein analysis revealed reliable cytoplasmic prediction for all dehalogenases (Table 3), except HAD07 and 09 which are typical transmembrane protein with several helices, and also HAD10 which has a signal peptide and probable periplasmic (Gram negative) or extracellular location predictions (Gram positive).

3.7. Detection of the 2-domain bacterial laccase Two proteins with bacterial laccase domains were identified using custom profile Hidden Markov models (Ausec and Zakrzewski, 2011). Subsequent analysis showed that they originate from the same contig as neighbor genes separated by a stop codon. A comparative examination of the sequence based on alignment with other laccases indicated that the stop codon might be a result of a point mutation, resulting in substitution of the usual arginine residue (CGA or AGA) into a stop codon (TGA) (data not shown). As the alignment of this contig showed perfect deep coverage without any read mismatches at the stop-codon location (the TGA codon was covered 103 times), it is most likely that this mutation might have occurred in the original organism, resulting in a truncated protein. Beside this mutation, the complete gene and predicted integral protein (LAC01-02, Table 2) would have all the usual characteristics of a typical two-domain bacterial laccase in terms of length, organization of the copper-binding regions, and the presence of a signal peptide at the N-terminus section. According to the probes used, the laccase share similarities with Proteobacteria at the nucleic acid level. The overall contig is affiliated to Coriobacteriaceae, and also harbors two transposases TRP33-34 and a phage integrase INT18. Protein analysis indicated the presence of a signal peptide and predictions are supporting periplasmic (Gram-negative) or cytoplasmic (Gram-positive) locations (Table 3).

512

513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536

537

538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560

4. Discussion

561

4.1. Interest and limits of the approach

562

In this study, 405 504 clones from a soil metagenomic fosmid library were screened using a DNA/DNA hybridization-based approach. We used a mixed pool of radio-labeled probes targeting several genes encoding enzymes of industrial interest (glycosyl hydrolases related to chitin degradation, dehalogenases, laccases) and mobile genetic elements (integron integrases and insertion sequences). The screening resulted in identification of 88 positive

Please cite this article in press as: Jacquiod, S., et al., Characterization of new bacterial catabolic genes and mobile genetic elements by high throughput genetic screening of a soil metagenomic library. J. Biotechnol. (2014), http://dx.doi.org/10.1016/j.jbiotec.2014.03.036

563 564 565 566 567 568 569

G Model BIOTEC 6655 1–12

ARTICLE IN PRESS S. Jacquiod et al. / Journal of Biotechnology xxx (2014) xxx–xxx

626

clones and 94 putative coding DNA sequences (CDS) of interest embedded in 42 metagenomic contigs. One of the major advantages of this approach is to screen large amount of clones for several genetic targets at the same time. In this study, DNA/DNA hybridization screening was efficient and sensitive enough to detect almost all the desired genes. The flexibility of the hybridization process is the second advantage of this technique. Temperature, incubation time, and salt content are the key parameters controlling the hybridization stringency. This flexibility should theoretically help design recovery of dissimilar sequences diverging from the original probe. In this experiment, preliminary hybridization tests were performed in order to define the right conditions. If the stringency of the pairing process was too high, only hits quasi-similar to probes sequences would have been retrieved, thus missing the opportunity of getting novelties. On the other hand, if too low, undesired hybridization would occur and hide positives hits (see File S4 for complete description of preliminary hybridization procedure). As expected, this strategy resulted in hybridization hits against coherent genetic target considering the nucleic acid origin of probes, but also in some case recovery of very divergent sequences (e.g. HAD01, 05 and 08). Considering the high diversity level of soil microbial communities, the technique efficiency rate was in accordance with previously reported results from functional-based screening of soil metagenomic libraries (Nacke et al., 2012; Wang et al., 2009; Kim et al., 2008). However, as several genes were simultaneously screened in this work, the final yield would be lower considering each target independently. In this sense, as we observed 14 cases of co-occurrence on the same sequence (5 contigs with of catabolic genes and MGE, and 9 with both IS/transposase and integrase), we also suspect co-hybridization of several probes on the same metagenomic inserts, probably resulting in stronger hybridization signals and easier identification. Detection of positive clones involves subjective visualization of the double spots on membranes. In combination with the high sensitivity of 33 P, the double clones were clear enough to be easily detectable on the membrane (File S5). However, due to the presence of an uneven background noise and parasite signals (e.g. dust and cell debris), the visualization is not perfect and might have resulted in selection of false positives. On the other hand, some positive dots might have been missed due to molecular quenching, which would limit the intensity and hide the signals within the background. As a consequence, we deliberately chose to enlarge the selection through systematic recovery of any clones displaying suspicious signals. This consideration, in addition to the deliberate low level of pairing stringency, could easily explain the relative high abundance of contigs without any traces of putative hybridization hits. In fact, among the assembled contigs, only 40% (42/106) had a targeted gene, which means that the remaining part corresponds either to unassembled pieces of contigs or false positives introduced by undesired probe hybridizations and parasite signals. Nevertheless, the possibility of hybridization against subsequently uncharacterized hypothetical CDS with no clear database affiliation (which are representing 81% of the total CDS amount according to RAST and SEED classifier), is still conceivable. No matter how innovative the technique is, annotation against public databases is still required, and is often the critical point when trying to discover novelties.

627

4.2. Contig assembly and taxonomical assignation

570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625

628 629 630 631 632 633

The assembly of reads into high quality contigs was a critical point in the workflow as it influences biases downstream and consequently, the biological conclusions. In this study, 106 high quality contigs were successfully assembled (20× minimum average coverage and 5 kb minimal length), and their corresponding assembly alignments were systematically visually inspected in search of

9

weak coverage spots. However, more sequences than expected were generated (106 instead of 88) as an indication of assembly limitations. The high occurrence of insertion sequences/transposases (50) is expected to have generated sequence redundancy during assembly computation, which is known to considerably limit contig extensions (Green, 2009). However, according to the expected size of metagenomic inserts (File S1) we found exactly 88 contigs with a size greater than 15 kb (Figure 2 from File S6). Therefore, we can assume that all the core metagenomic inserts have been fully assembled. The remaining 18 contigs are probably corresponding to detached 5 and 3 extremities that could not be linked to their respective core insert due to sequence redundancy or insufficient sequencing depth. Nevertheless, assembly limitations did not compromise the goals of this study. For instance, due to their relative long size, contigs generated in the frame of this study are prone for sequence-based taxonomical affiliation through computation of k-mer frequencies (Patil et al., 2012). The taxonomical profile obtained from the hybridized clone library differs considerably from the one generated from shotgun sequencing of the same soil metagenomic DNA solution (Fig. 2). This suggests that the screening step has enriched for sequences related to specific bacterial groups. This includes a noticeable enrichment toward Actinobacteria, mostly dominated by Coriobacteriales and Actinomycetales, as well as an increase of Chloroflexi, Acidobacteria and Proteobacteria (Alpha- and Delta-). A closer look at the taxonomical affiliation of the 42 contigs harboring hybridization hits also revealed that they were mostly affiliated to Coriobacteriales and Actinomycetales, as well as Planctomycetes, Verrucomicrobia, Chloroflexi and Unclassified bacteria. As these differences were most likely due to the prevalence of MGE hits, this might only reflects the effect of the selected set of probes for this particular category. However, we deliberately decided to operate hybridization under low pairing stringency for maximizing the recovery of novel sequences. Therefore, even if taxonomical coherency was observed considering nucleic acid similarities between CDS hits and the probes, drawing any conclusions about taxonomical assignation of contigs just on the basis of sole probe’s origin would be too unreliable. 4.3. Detection of mobile genetic elements (integrases and IS/transposases) Considering that the other catabolic genes we screened are already known to be widely distributed across bacteria phyla in soils, including laccases (Ausec et al., 2011), chitinases (Jacquiod et al., 2013) and dehalogenases (Richardson, 2013), the relative high prevalence of mobile genetic element CDS recovered (n = 76, Table S3 and Table S4) can either be attributed to an efficient hybridization toward these elements, or to an initial high prevalence within the clone library or the Rothamsted soil itself. Most of the hits were related to IS/transposase systems and displayed a high level of sequence diversity probably coming from the low pairing stringency applied. However, this is coherent with previous observation already reporting the remarkably high sequence diversity of IS/transposases (Kamoun et al., 2013). On the other hand, 4 integron integrase CDS were identified (Table 2), but also 14 phage integrases and 8 other integrases (Table S3). All these integrases are affiliated to the tyrosine recombinase protein family, known to be conserved throughout bacteria (Das et al., 2013) and phages (Groth and Calos, 2004). Selection and identification of these CDS hits is also most likely due to the low pairing stringency applied in this study. The majority of the MGE CDS were observed on Actinobacteria related contigs (Coriobacteriaceae and Actinomycetales). Actinobacteria are known to contain a large reservoir of MGEs (Ghinet et al., 2011). Members from the Actinomycetales order harboring MGEs have already been isolated, including

Please cite this article in press as: Jacquiod, S., et al., Characterization of new bacterial catabolic genes and mobile genetic elements by high throughput genetic screening of a soil metagenomic library. J. Biotechnol. (2014), http://dx.doi.org/10.1016/j.jbiotec.2014.03.036

634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671

672 673

674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697

G Model BIOTEC 6655 1–12 10 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720

721 722

723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761

ARTICLE IN PRESS S. Jacquiod et al. / Journal of Biotechnology xxx (2014) xxx–xxx

Mycobacterium spp. (Martin et al., 1990) and Corynebacterium spp. (Nandi et al., 2004; Tauch et al., 2002). However, little is known on the Coriobacteriia class, which includes the Coriobacteriaceae family (Gupta et al., 2013). Our results demonstrate an observable prevalence of MGE within the Coriobacteriaceae family, and support the possibility of their involvement in horizontal gene transfer in natural soils. Other MGE CDS were identified on contigs belonging to members of the PVC superphylum (Planctomycetes, Verrucomicrobia and Chlamydiae; Wagner and Horn, 2006). While more attention has been given to the PVC superphylum due to their prevalence in a wide range of habitats like soil (Bergmann et al., 2011) and its potential for hosting novel biomolecules (Jeske et al., 2013), its actual ecological role still remains largely unknown, especially regarding its involvement in horizontal gene transfer and exogenous DNA acquisition (Budd and Devos, 2012; Domman et al., 2010; Gourbeyre et al., 2010; Griffiths and Gupta, 2006). Last fraction of MGEs was observed on sequences related to Choloroflexi. Members of this phylum were recently reported to be involved in HGT as donors in deep sea (Brochier-Armanet et al., 2011), but not as yet in soil. Our study supports the notion that environmental bacteria related to Verrucomicrobia, Planctomycetes and Chloroflexi phyla might play an active role in horizontal transfer of genetic material in natural soil. 4.4. Detection of putative glycoside hydrolases related to chitin degradation Highly divergent probes were deliberately selected in order to increase our chances to detect potentially new glycoside hydrolase related to family GH18 (File S2). Unfortunately, no convincing chitinase hits were found. However, detection of several genes encoding glycosyl hydrolases and deacetylases involved in chitin and NAG utilization where found, and therefore, reported here (Table 2). Four CDS related to glycosyl hydrolases GH2, 3, 20 and 25 were recovered, showing high protein sequence divergence (44–64%). Amino acid sequence analysis predicted reliable cytoplasmic location for the GH3-CHI01 (Table 3). In spite of the mixed location results, signal peptides were found on CHI02, 03 and 07, indicating their probable exportation, which is coherent considering this type of enzymes. CHI02 is a hexosaminidase from GH20 family, and can be either secreted or attached to the membrane (Gruber and Seidl-Seiboth, 2012). CHI03 is a sugar binding protein from GH2 family, known to contain exo-␤-glucosaminidase (EC 3.2.1.165) that degrade N-deacetylated chitin derivative like chitosan. CHI07 is a lysozyme from GH25 family and is also expected to be secreted (Korczynska et al., 2010). In addition, 4 deacetylases involved in chitin and NAG deacetylation where identified (DEA01-04) with high protein sequence divergence (31–50%). The deacetylases DEA01 and 2 are chitin deacetylases from family CE4, which are catalyzing the hydrolysis of the acetyl functions, turning chitin into chitosan, while DEA03-04 are NAG-6 phosphate deacetylases catalyzing the same reaction on NAG released by hexosaminidases. DEA01 and 04 have good cytoplasmic predictions while DEA02 has a predicted signal peptide and transmembrane helix, which seems to correspond to the Gram-positive membrane location prediction. DEA03 has no clear location prediction. Contigs harboring CDS related to chitin degradation were found to be specifically enriched with other genes involved in primary metabolism (e.g. protein, carbon and iron metabolisms). For example, the contig#24 affiliated to Actinomycetales (panel A from File S8), which includes CHI02, harbors many genes involved in carbohydrate metabolism including ABC sugar transporters for binding and incorporation of short in-coming carbohydrates. In addition, three other CDS involved in chitin degradation were

observed including DEA03 and 04 are affiliated to NAG-6 phosphate deacetylases (EC 3.5.1.25) and an oxidoreductase involved in NAG utilization. Oxidoreductases are important NAD+ dependent enzymes involved in cleavage of complex carbohydrate polymers (Langston et al., 2011). This contig is probably carrying part of an operon involved in complex carbohydrate degradation. The other intriguing aspect of this contig is the presence of three transposases (TRP09-11). Furthermore, the GC profile of this contig (contig#24) clearly displayed a drop in the GC content at the IS/transposase location, which could indicate a HGT event (panel A from File S8). Previous studies have already reported potential horizontal gene transfer of chitinases between eukaryotes and prokaryotes based on protein analysis (Ubhayasekera and Karlsson, 2012; Lohtander et al., 2008), and also strong involvement of HGT for genes encoding plant cell wall polysaccharide degrading enzymes (Pauchet and Heckel, 2013). Our data also supports that the genes involved in complex carbohydrate degradation such as chitin could be prone to horizontal transfer through soil bacteria. 4.5. Dehalogenase hits The 8 dehalogenase hits were all affiliated to the HAD superfamily (haloacid dehalogenase, Lahiri et al., 2004) with low identity against closest hits in databases based on the amino acid sequences (46–75%, Table 2). This is coherent with this type of enzymes as their protein sequences are known to share low identity (Kulakova et al., 1997). These enzymes are most likely found in organohalide-respiring bacteria, using halogenated molecules as electron acceptor. This is coherent with most location prediction (Table 3), suggesting that the dehalogenase we found are cytoplasmic or membrane-related enzymes. This is particularly true for HAD07 and 09, as they display multiple predicted transmembrane helices in their structure. HAD10 is the only exception, as it has a predicted signal peptide, and possible periplasm or extracellular location depending on the Gram type considered. Beside their use in bioremediation for dechlorination of polluted sites, this respiration process is known to occur in noncontaminated soils (Leri and Myneni, 2010), due to biological occurrence of other organo-halogens (e.g. organobromides and organochlorines) (Krzmarzick et al., 2012). This could easily explain the detection of such enzymes in the untreated natural soil from the Park Grass Experimental in the Rothamsted Research Station. Furthermore, HAD05, 07 and 09 were found close MGE on their respective contigs. This observation would tend to indicate a potential implication of horizontal gene transfer. In fact, genes involved in xenobiotic compound-degradation are known to be part of the gene pool susceptible to be disseminated and acquired via horizontal gene transfer, as they are often found on plasmids and/or near mobile genetic elements such as insertion sequences (Cérémonie et al., 1999). This observation is reinforced in the case of HAD05 by the presence of a shift in the GC profile of the contig#54, in favor of a lower GC content at the IS/transposase location (panel B from File S8). 4.6. Detection of the bacterial laccase The truncated laccase is 65% similar to the closest hit in database (Table 2), indicating noticeable novelty at the protein level. As the truncated protein has all the expected characteristic of a classical two domain laccase in terms of structural organization, size and presence of a signal peptide, we hypothesized that the stop codon was the result of a point mutation occurring in the original organism, splitting the ORF in two multicopper oxidases. The laccase hit was found in a contig affiliated to the Coriobacteriaceae family (Actinobacteria), within a coherent genetic context (panel C from File S8) including several genes involved in copper homeostasis and heavy

Please cite this article in press as: Jacquiod, S., et al., Characterization of new bacterial catabolic genes and mobile genetic elements by high throughput genetic screening of a soil metagenomic library. J. Biotechnol. (2014), http://dx.doi.org/10.1016/j.jbiotec.2014.03.036

762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779

780

781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812

813

814 815 816 817 818 819 820 821 822 823

G Model BIOTEC 6655 1–12

ARTICLE IN PRESS S. Jacquiod et al. / Journal of Biotechnology xxx (2014) xxx–xxx

824 825 826 827 828 829 830 831 832 833 834 835

836

metal resistance, as well as two transposase genes TRP33 and 34, and also a phage integrase INT18. The GC profile of this contig (contig#59) is showing a clear and sharp drop in the GC content at the location of IS/Transposase and the phage integrase, while the part containing the laccase is clearly in a stable GC-rich region (panel C from File S8). In fact, bacterial laccases are known to be associated with mobile genetic elements as they can be found in plasmids (Ausec and Zakrzewski, 2011). The truncated two-domain laccase identified in the present study was probably a functional enzyme, showing noticeable sequence differences against the known proteins from public databases, thus enriching our knowledge about this class of enzymes. 5. Conclusion

853

Sequence-based screening of metagenomic clone libraries using DNA/DNA hybridization with mixed-oligonucleotide probes is a useful and pertinent strategy for finding novel metagenomic resources. The technique is efficient and our results support positive forecast for further applications (e.g. more clones, more targets and probes, improvement of positive clone identification). This approach can be applied as a relevant complement in addition to traditional functional screening, as it allows handing of large numbers of clones for identifying only the most promising candidates based on sequence similarities. The reduced number of positive clones can be tested further thanks to activity-based screening in order to link the identified coding DNA sequences to the final desired activities. This method also proved to be efficient in addressing fundamental aspects of microbial ecology, such as the co-occurrence of mobile genetic elements and important catabolic genes, thus providing indication regarding their potential dissemination through horizontal transfer.

Q5 854

Uncited reference

837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852

855

856

Holben et al. (1988). Acknowledgements

This research was funded in part by the European Union project Q7 MetaExploreand in part by the French Research Agency (Agence 858 Nationale de la Recherche) ANR project Metasoil. SJ, LA and VD were 859 supported by the EU project Metaexplore. TOD was supported by 860 doctoral funding from the Rhône-Alpes Region. ZX was supported 861 by the ITN Marie-Curie project Trainbiodiverse. We would like 862 to acknowledge Patrick Robe, engineer at Libragen SA (Toulouse, 863 France) for the library construction and quality control checking. 864 Q6 857

865

Appendix A. Supplementary data

868

Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/ j.jbiotec.2014.03.036.

869

References

866 867

870 871 872 873 874 875 876 877 878 879 880 881

Ausec, L., Zakrzewski, M., Goesmann, A., Schlüter, A., Mandic-Mulec, I., 2011. Bioinformatic analysis reveals high diversity of bacterial genes for laccase-like enzymes. PLoS ONE 6, e25724. Aziz, R.K., Bartels, D., Best, A.A., DeJongh, M., Disz, T., Edwards, R.A., Formsma, K., Gerdes, S., Glass, E.M., Kubal, M., Meyer, F., Olsen, G.J., Olson, R., Osterman, A.L., Overbeek, R.A., McNeil, L.K., Paarmann, D., Paczian, T., Parrello, B., Pusch, G.D., Reich, C., Stevens, R., Vassieva, O., Vonstein, V., Wilke, A., Zagnitko, O., 2008. The RAST server: rapid annotations using subsystems technology. BMC Genomics 9, 75. Bergmann, G.T., Bates, S.T., Eilers, K.G., Lauber, C.L., Caporaso, J.G., Walters, W.A., Knight, R., Fierer, N., 2011. The under-recognized dominance of Verrucomicrobia in soil bacterial communities. Soil Biol. Biochem. 43, 1450–1455.

11

Berlemont, R., Pipers, D., Delsaute, M., Angiono, F., Feller, G., Galleni, M., Power, P., 2011. Exploring the Antarctic soil metagenome as a source of novel coldadapted enzymes and genetic mobile elements. Rev. Argent. Microbiol. 43, 94–103. Bertrand, H., Poly, F., Van, V.T., Lombard, N., Nalin, R., Vogel, T.M., Simonet, P., 2005. High molecular weight DNA recovery from soils prerequisite for biotechnological metagenomic library construction. J. Microb. Methods 62, 1–11. Brochier-Armanet, C., Deschamps, P., López-García, P., Zivanovic, Y., RodríguezValera, F., Moreira, D., 2011. Complete-fosmid and fosmid-end sequences reveal frequent horizontal gene transfers in marine uncultured planktonic archaea. ISME J. 5, 1291–1302. Budd, A., Devos, D.P., 2012. Evaluating the evolutionary origins of unexpected character distributions within the Bacterial Planctomycetes–Verrucomicrobia–Chlamydiae Superphylum. Front. Microbiol. 3, 401. Cérémonie, H., Boubakri, H., Mavingui, P., Simonet, P., Vogel, T.M., 1999. Plasmidencoded gamma-hexachlorocyclohexane degradation genes and insertion sequences in Sphingobium francense (ex-Sphingomonas paucimobilis Sp+). FEMS Microbiol. Lett. 257, 243–252. Cretoiu, M.S., Kielak, A.M., Abu Al-Soud, W., Sørensen, S.J., van Elsas, J.D., 2012. Mining of unexplored habitats for novel chitinases – chiA as a helper gene proxy in metagenomics. Appl. Microbiol. Biotechnol. 94, 1347–1358. Das, B., Martínez, E., Midonet, C., Barre, F.X., 2013. Integrative mobile elements exploiting Xer recombination. Trends Microbiol. 21, 23–30. Delmont, T.O., Prestat, E., Keegan, K.P., Faubladier, M., Robe, P., Clark, I.M., Pelletier, E., Hirsch, P.R., Meyer, F., Gilbert, J.A., Le Paslier, D., Simonet, P., Vogel, T.M., 2012. Structure, fluctuation and magnitude of a natural grassland soil metagenome. ISME J. 6, 1677–1687. Demanèche, S., David, M.M., Navarro, E., Simonet, P., Vogel, T.M., 2009. Evaluation of functional gene enrichment in a soil metagenomic clone library. J. Microbiol. Methods 76, 105–107. Domman, D.B., Steven, B.T., Ward, N.L., 2010. Random transposon mutagenesis of Verrucomicrobium spinosum DSM 4136(T). Arch. Microbiol. 193, 307–312. 2006. Guidelines for Soil Description. FAO, Rome, Italy FAO, ftp://ftp.fao.org/agl/agll/docs/guidel soil descr.pd. Foerstner, K.U., Doerks, T., Creevey, C.J., Doerks, A., Bork, P., 2008. A computational screen for type I polyketide synthases in metagenomics shotgun data. PLoS ONE 3, e3515. Gabor, E.M., de Vries, E.J., Janssen, D.B., 2004. Construction, characterization, and use of small-insert gene banks of DNA isolated from soil and enrichment cultures for the recovery of novel amidases. Environ. Microbiol. 6, 948–958. Gao, F., Zhang, C.T., 2006. GC-profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences. Nucleic Acids Res. 34 (Web Server issue), W686–W691. Ghinet, M.G., Bordeleau, E., Beaudin, J., Brzezinski, R., Roy, S., Burrus, V., 2011. Uncovering the prevalence and diversity of integrating conjugative elements in Actinobacteria. PLoS ONE 6, e27846. Ginolhac, A., Jarrin, C., Gillet, B., Robe, P., Pujic, P., Tuphile, K., Bertrand, H., Vogel, T.M., Perriere, G., Simonet, P., Nalin, R., 2004. Phylogenic analysis of polyketide synthase I domains from soil metagenomic libraries allows selection of promising clones. Appl. Environ. Microbiol. 70, 5522–5527. Glogauer, A., Martini, V.P., Faoro, H., Couto, G.H., Müller-Santos, M., Monteiro, R.A., Mitchell, D.A., de Souza, E.M., Pedrosa, F.O., Krieger, N., 2011. Identification and characterization of a new true lipase isolated through metagenomic approach. Microb. Cell Fact. 10, 54. Gourbeyre, E., Siguier, P., Chandler, M., 2010. Route 66: investigations into the organisation and distribution of the IS66 family of prokaryotic insertion sequences. Res. Microbiol. 161, 136–143. Green, P., 2009. Whole-genome disassembly. Proc. Natl. Acad. Sci. U. S. A. 99, 4143–4144. Groth, A.C., Calos, M.P., 2004. Phage integrases: biology and applications. J. Mol. Biol. 335, 667–678. Griffiths, E., Gupta, R.S., 2006. Lateral transfers of serine hydroxymethyl transferase (glyA) and UDP-N-acetylglucosamine enolpyruvyl transferase (murA) genes from free-living Actinobacteria to the parasitic Chlamydiae. J. Mol. Evol. 63, 283–296. Gruber, S., Seidl-Seiboth, V., 2012. Self versus non-self: fungal cell wall degradation in Trichoderma. Microbiology 58, 26–34. Gupta, R.S., Chen, W.J., Adeolu, M., Chai, Y., 2013. Molecular signatures for the class Coriobacteria and its different clades; proposal for division of the class Coriobacteriia into the emended order Coriobacteriales, containing the emended family Coriobacteriaceae and Atopobiaceae fam. nov., and Eggerthellales ord. nov., containing the family Eggerthellaceae fam. nov. J. Syst. Evol. Microbiol. 63, 3379–3397. Haft, D.H., Selengut, J.D., Richter, R.A., Harkins, D., Basu, M.K., Beck, E., 2013. TIGRFAMs and genome properties in 2013. Nucleic Acids Res. 41 (Database issue), D387–D395. Hall, R.M., 2012. Integrons and gene cassettes: hotspots of diversity in bacterial genomes. Ann. N. Y. Acad. Sci. 1267, 71–78. Holben, W.E., Jansson, J.K., Chelm, B.K., Tiedje, J.M., 1988. DNA probe method for the detection of specific microorganisms in the soil bacterial community. Appl. Environ. Microbiol. 54, 703–711. Horn, S.J., Sikorski, P., Cederkvist, J.B., Vaaje-Kolstad, G., Sørlie, M., Synstad, B., Vriend, G., Vårum, K.M., Eijsink, V.G., 2006. Costs and benefits of processivity in enzymatic degradation of recalcitrant polysaccharides. Proc. Natl. Acad. Sci. U. S. A. 103, 18089–18094.

Please cite this article in press as: Jacquiod, S., et al., Characterization of new bacterial catabolic genes and mobile genetic elements by high throughput genetic screening of a soil metagenomic library. J. Biotechnol. (2014), http://dx.doi.org/10.1016/j.jbiotec.2014.03.036

882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967

G Model BIOTEC 6655 1–12 12 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045

ARTICLE IN PRESS S. Jacquiod et al. / Journal of Biotechnology xxx (2014) xxx–xxx

Hu, Y., Fu, C., Huang, Y., Yin, Y., Cheng, G., Lei, F., Lu, N., Li, J., Ashforth, E.J., Zhang, L., Zhu, B., 2010. Novel lipolytic genes from the microbial metagenomic library of the South China Sea marine sediment. FEMS Microbiol. Ecol. 72, 228–237. Huang, L., Cagnon, C., Caumette, P., Duran, R., 2009. First gene cassettes of integrons as targets in finding adaptive genes in metagenomes. Appl. Environ. Microbiol. 75, 3823–3825. Huang, Y., Niu, B., Gao, Y., Fu, L., Li, W., 2010. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26, 680–682. Jacquiod, S., Franqueville, L., Cécillon, S., Vogel, T.M., Simonet, P., 2013. Soil bacterial community shifts after chitin enrichment: an integrative metagenomic approach. PLoS ONE 8, e79699. Jeske, O., Jogler, M., Petersen, J., Sikorski, J., Jogler, C., 2013. From genome mining to phenotypic microarrays: Planctomycetes as source for novel bioactive molecules. Antonie Leeuwenhoek 104, 551–567. Käll, L., Krogh, A., Sonnhammer, E.L., 2004. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338, 1027–1036. Kamoun, C., Payen, T., Hua-Van, A., Filée, J., 2013. Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods. BMC Genomics 14, 700. Kim, J.S., Lim, H.K., Lee, M.H., Park, J.H., Hwang, E.C., Moon, B.J., Lee, S.W., 2009. Production of porphyrin intermediates in Escherichia coli carrying soil metagenomic genes. FEMS Microbiol. Lett. 295, 42–49. Kim, J.S., Lee, C.M., Han, B.R., Kim, M.Y., Yeo, Y.S., Yoon, S.H., Koo, B.S., Jun, H.K., 2008. Characterization of a gene encoding cellulase from uncultured soil bacteria. FEMS Microbiol. Lett. 282, 44–51. Korczynska, J.E., Danielsen, S., Schagerlöf, U., Turkenburg, J.P., Davies, G.J., Wilson, K.S., Taylor, E.J., 2010. The structure of a family GH25 lysozyme from Aspergillus fumigatus. Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. 66, 973–977. Kulakova, A.N., Larkin, M.J., Kulakov, L.A., 1997. The plasmid-located haloalkane dehalogenase gene from Rhodococcus rhodochrous NCIMB 13064. Microbiology 143, 109–115. Krzmarzick, M.J., Crary, B.B., Harding, J.J., Oyerinde, O.O., Leri, A.C., Myneni, S.C., Novak, P.J., 2012. Natural niche for organohalide-respiring Chloroflexi. Appl. Environ. Microbiol. 78, 393–401. Lahiri, S.D., Zhang, G., Dai, J., Dunaway-Mariano, D., Allen, K.N., 2004. Analysis of the substrate specificity loop of the HAD superfamily cap domain. Biochemistry 43, 2812–2820. Langston, J.A., Shaghasi, T., Abbate, E., Xu, F., Vlasenko, E., Sweeney, M.D., 2011. Oxidoreductive cellulose depolymerization by the enzymes cellobiose dehydrogenase and glycoside hydrolase 61. Appl. Environ. Microbiol. 77, 7007–7015. Lefevre, F., Robe, P., Jarrin, C., Ginolhac, A., Zago, C., Auriol, D., Vogel, T.M., Simonet, P., Nalin, R., 2008. Drugs from hidden bugs: their discovery via untapped resources. Res. Microbiol. 159, 153–161. Leis, B1, Angelov, A., Liebl, W., 2013. Screening and expression of genes from metagenomes. Adv. Appl. Microbiol. 83, 1–68. Leri, A.C., Myneni, S.C.B., 2010. Organochlorine turnover in forest ecosystems: the missing link in the terrestrial chlorine cycle. Global Biogeochem. Cycles 24, GB4021. Lohtander, K., Pasonen, H.L., Aalto, M.K., Palva, T., Pappinen, A., Rikkinen, J., 2008. Phylogeny of chitinases and its implications for estimating horizontal gene transfer from chitinase-transgenic silver birch (Betula pendula). Environ. Biosafety Res. 7, 227–239. Madupu, R., Richter, A., Dodson, R.J., Brinkac, L., Harkins, D., Durkin, S., Shrivastava, S., Sutton, G., Haft, D., 2012. CharProtDB: a database of experimentally characterized protein annotations. Nucleic Acids Res. 40 (Database issue), D237–D241. Magrane, M., UniProt Consortium, 2011. UniProt Knowledgebase: a hub of integrated protein data. Database 2011, bar009. Manichanh, C., Chapple, C.E., Frangeul, L., Gloux, K., Guigo, R., Dore, J., 2008. A comparison of random sequence reads versus 16S rDNA sequences for estimating the biodiversity of a metagenomic library. Nucleic Acids Res. 36, 5180–5188. Martin, C., Timn, J., Rauzier, J., Gomez-Lus, R., Davies, J., Gicquel, B., 1990. Transposition of an antibiotic resistance element in Mycobacteria. Nature 345, 739–743. Milne, I., Stephen, G., Bayer, M., Cock, P.J.A., Pritchard, L., Cardle, L., Shaw, P.D., Marshall, D., 2013. Using tablet for visual exploration of second-generation sequencing data. Brief Bioinform. 14, 193–202. Mocali, S., Benedetti, A., 2010. Exploring research frontiers in microbiology: the challenge of metagenomics in soil microbiology. Res. Microbiol. 161, 497–505. Nacke, H., Engelhaupt, M., Brady, S., Fischer, C., Tautzt, J., Daniel, R., 2012. Identification and characterization of novel cellulolytic and hemicellulolytic genes and enzymes derived from German grassland soil metagenomes. Biotechnol. Lett. 34, 663–675. Nandi, S., Maurer, J.J., Hofacre, C., Summers, A.O., 2004. Gram-positive bacteria are a major reservoir of class 1 antibiotic resistance integrons in poultry litter. Proc. Natl. Acad. Sci. U. S. A. 101, 7118–7122. Pauchet, Y., Heckel, D.G., 2013. The genome of the mustard leaf beetle encodes two active xylanases originally acquired from bacteria through horizontal gene transfer. Proc. Biol. Sci. 280, 20131021.

Parachin, N.S., Gorwa-Grauslund, M.F., 2011. Isolation of xylose isomerases by sequence- and function-based screening from a soil metagenomic library. Biotechnol. Biofuels 4, 9. Parsley, L.C., Consuegra, E.J., Kakirde, K.S., Land, A.M., Harper Wf, J.R., Liles, M.R., 2010. Identification of diverse antimicrobial resistance determinants carried on bacterial, plasmid, or viral metagenomes from an activated sludge microbial assemblage. Appl. Environ. Microbiol. 76, 3753–3757. Patil, K.R., Roune, L., McHardy, A.C., 2012. The PhyloPythiaS web server for taxonomic assignment of metagenome sequences. PLoS ONE 7, e38581. Patil, K.R., Haider, P., Pope, P.B., Turnbaugh, P.J., Morrison, M., Scheffer, T., McHardy, A.C., 2011. Taxonomic metagenome sequence assignment with structured output models. Nat. Methods 8, 191–192. Payne, R.B., Fagervold, S.K., May, H.D., Sowers, K.R., 2013. Remediation of polychlorinated biphenyl impacted sediment by concurrent bioaugmentation with anaerobic halorespiring and aerobic degrading bacteria. Environ. Sci. Technol. 47, 3807–3815. Petersen, T.N., Søren Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785–786. Pham, V.D., Palden, T., DeLong, E.F., 2007. Large-scale screens of metagenomic libraries. J. Vis. Exp. 4, 201. Providenti, M.A., O’Brien, J.M., Ewing, R.J., Paterson, E.S., Smith, M.L., 2006. The copy-number of plasmids and other genetic elements can be determined by SYBR-green-based quantitative real-time PCR. J. Microbiol. Methods 65, 476–487. Rabausch, U., Juergensen, J., Ilmberger, N., Böhnke, S., Fischer, S., Schubach, B., Schulte, M., Streit, W.R., 2013. Functional screening of metagenome and genome libraries for detection of novel flavonoid-modifying enzymes. Appl. Environ. Microbiol. 79, 4551–4563. Richardson, R.E., 2013. Genomic insights into organohalide respiration. Curr. Opin. Biotechnol. 24, 498–505. Rodríguez Couto, S., Toca Herrera, J.L., 2006. Industrial and biotechnological applications of laccases: a review. Biotechnol. Adv. 24, 500–513. Silvertown, J., Poulton, P., Johnston, E., Edwards, G., Heard, M., Biss, P.M., 2006. The Park Grass experiment 1856–2006: its contribution to ecology. J. Ecol. 94, 801–814. Sota, M., Yano, H., Nagata, Y., Ohtsubo, Y., Genka, H., Anbutsu, H., Kawasaki, H., Tsuda, M., 2006. Functional analysis of unique class II insertion sequence IS1071. Appl. Environ. Microbiol. 72, 291–297. Stothard, P., 2000. The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences. BioTechniques 28, 1102–1104. Szczepanowski, R., Eikmeyer, F., Harfmann, J., Blom, J., Rogers, L.M., Top, E.M., Schlüter, A., 2011. Sequencing and comparative analysis of IncP-1␣ antibiotic resistance plasmids reveal a highly conserved backbone and differences within accessory regions. J. Biotechnol. 155, 95–103. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739. Tasse, L., Bercovici, J., Pizzut-Serin, S., Robe, P., Tap, J., Klopp, C., Cantarel, B.L., Coutinho, P.M., Henrissat, B., Leclerc, M., Doré, J., Monsan, P., Remaud-Simeon, M., Potocki-Veronese, G., 2010. Functional metagenomics to mine the human gut microbiome for dietary fiber catabolic enzymes. Genome Res. 20, 1605–1612. Tauch, A., Gotker, S., Puhler, A., Kalinowski, J., Thierbach, G., 2002. The 27.8-kb Rplasmid pTET3 from Corynebacterium glutamicumen codes the aminoglycoside adenyltransferase gene cassette aadA9 and the regulated tetracycline efflux system Tet 33 flanked by active copies of the widespread insertion sequence IS6100. Plasmid 48, 117–129. Ubhayasekera, W., Karlsson, M., 2012. Bacterial and fungal chitinase chiJ orthologues evolve under different selective constraints following horizontal gene transfer. BMC Res. Notes 5, 581. Uchiyama, T., Abe, T., Ikemura, T., Watanabe, K., 2005. Substrate-induced geneexpression screening of environmental metagenome libraries for isolation of catabolic genes. Nat. Biotechnol. 23, 88–93. Vogel, T.M., Simonet, P., Jansson, J.K., Hirsh, P.R., Tiedje, J.M., Van Elsas, J.D., Bailey, M.J., Nalin, R., Philippot, L., 2009. TerraGenome: a consortium for the sequencing of a soil metagenome. Nat. Rev. Microbiol. 7, 252. Wagner, M., Horn, M., 2006. The Planctomycetes, Verrucomicrobia, Chlamydiae and sister phyla comprise a superphylum with biotechnological and medical relevance. Curr. Opin. Biotechnol. 17, 241–249. Wang, F., Li, F., Chen, G., Liu, W., 2009. Isolation and characterization of novel cellulase genes from uncultured microorganisms in different environmental niches. Microbiol. Res. 164, 650–657. Wexler, M., Johnston, A.W., 2010. Wide host-range cloning for functional metagenomics. Methods Mol. Biol. 668, 77–96. Williamson, L.L., Borlee, B.R., Schloss, P.D., Guan, C., Allen, H.K., Handelsman, J., 2005. Intracellular screen to identify metagenomic clones that induce or inhibit a quorum-sensing biosensor. Appl. Environ. Microbiol. 71, 6335–6344. Yu, C.S., Chen, Y.C., Lu, C.H., Hwang, J.K., 2006. Prediction of protein subcellular location. Proteins Struct. Funct. Bioinform. 64, 643–651.

Please cite this article in press as: Jacquiod, S., et al., Characterization of new bacterial catabolic genes and mobile genetic elements by high throughput genetic screening of a soil metagenomic library. J. Biotechnol. (2014), http://dx.doi.org/10.1016/j.jbiotec.2014.03.036

1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124

Characterization of new bacterial catabolic genes and mobile genetic elements by high throughput genetic screening of a soil metagenomic library.

A mix of oligonucleotide probes was used to hybridize soil metagenomic DNA from a fosmid clone library spotted on high density membranes. The pooled r...
1020KB Sizes 0 Downloads 3 Views