Available online at www.sciencedirect.com

ScienceDirect Recent advances in genome-based polyketide discovery Eric JN Helfrich1, Silke Reiter1 and Jo¨rn Piel Polyketides are extraordinarily diverse secondary metabolites of great pharmacological value and with interesting ecological functions. The post-genomics era has led to fundamental changes in natural product research by inverting the workflow of secondary metabolite discovery. As opposed to traditional bioactivity-guided screenings, genome mining is an in silico method to screen and analyze sequenced genomes for natural product biosynthetic gene clusters. Since genes for known compounds can be recognized at the early computational stage, genome mining presents an opportunity for dereplication. This review highlights recent progress in bioinformatics, pathway engineering and chemical analytics to extract the biosynthetic secrets hidden in the genome of both well-known natural product sources as well as previously neglected bacteria. Addresses Institute of Microbiology, Swiss Federal Institute of Technology Zurich (ETH), Vladimir-Prelog-Weg 1-5/10, 8093 Zurich, Switzerland Corresponding author: Piel, Jo¨rn ([email protected]) These authors contributed equally to this work.

1

Current Opinion in Biotechnology 2014, 29:107–115 This review comes from a themed issue on Cell and pathway engineering Edited by Tina Lu¨tke-Eversloh and Keith EJ Tyo

0958-1669/$ – see front matter, # 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.copbio.2014.03.004

multimodular) and mode of operation (e.g. iterative, noniterative, cis-AT or trans-AT) (Table 1). They can also form hybrid enzymes containing components from different PKS classes and/or non-ribosomal peptide synthetases (NRPSs) [1]. Natural product research by pharmaceutical companies has declined over the last two decades, which is in part a consequence of increased rediscovery rates encountered with traditional screening approaches [2,3]. However, with the advent of the genomic era, it has been recognized that a considerable proportion of the metabolic gene repertoire is expressed only at very low levels or is inactive under standard cultivation conditions [3]. This insight has triggered the development of increasingly sophisticated bioinformatic, genetic, and analytical tools that provide exciting new strategies for natural product discovery (Figure 1). An integral step of this process is the in silico screening of sequenced genomes for secondary metabolite biosynthetic gene clusters, followed by gene analysis and prediction of the biosynthetic product [2]. On the basis of this information, experiments for targeted discovery or activation of a pathway can be designed [3]. In this way it is now also possible to decipher the biosynthetic potential of microorganisms that have previously been inaccessible or considered unproductive by the scientific community [4]. This review highlights some recent examples of genome mining and general strategies that have recently emerged as being useful for polyketide discovery (Figure 1).

Rational targeting of cryptic polyketides Introduction Polyketides are a remarkably diverse group of secondary metabolites that are enzymatically produced by polyketide synthases (PKSs) [1]. Their structural diversity includes macrolides, enediynes, polyphenols, polyenes, and numerous other structural scaffolds [1]. Many clinically relevant agents, such as antibiotics, immunosuppressants, cytotoxins, and cholesterol-lowering substances, are polyketidederived [1]. Polyketides are assembled from simple acylCoA building blocks in repeated cycles of decarboxylative Claisen-like condensations. For complex (i.e. non-aromatic) polyketides, the great structural variety of encountered core skeletons is accomplished by the selection of building blocks and by facultative enzymatic modifications that affect the reductive state of an intermediate’s b-carbon atom. Further diversification can result from additional auxiliary enzymes acting during or after chain elongation. PKSs encompass several classes and subclasses that differ in enzyme architecture (non-modular, monomodular and www.sciencedirect.com

Much progress has been made in understanding the biosynthetic logic of polyketide production. This comprehensive knowledge has led to the definition of rules based on which polyketide structures and activities can be predicted from genome sequences and vice versa (Table 1). Mining the genome of Salinispora tropica for orphan PKS gene clusters revealed two putative enediyne biosynthesis clusters and one type II PKS cluster for aromatic polyketide biosynthesis [5]. On the basis of structure predictions, the authors hypothesized that the three gene clusters are responsible for the production of polyketides with DNA-intercalating activities. Guided by this hypothesis, an approach that linked genome mining with bioactivity-based screenings for DNA interference was followed, leading to the isolation of lomaiviticin (1) (Figure 2) as the product of the type II PKS cluster [5]. The more recently discovered trans-AT PKS family is characterized by a large number of unusual enzymatic Current Opinion in Biotechnology 2014, 29:107–115

108 Cell and pathway engineering

Table 1 Overview of PKS classes and applicable prediction methods and online tools [1,48] Organization Type I

(Multi-)modular enzymes with distinct active sites performing the catalysis of one or more cycles of chain elongation and modification

Type II

Multi-enzymecomplex, each enzyme performs one reaction iteratively

Type III

Homodimeric enzymes containing all enzyme functionalities in one active site

Subclasses

Minimal domains/ enzymes

Prediction methods

Prediction tools b

Fungal monomodular (iterative)

NR PR HR

KS AT PT ACP KS AT KR DH ACP KS AT KR DH ER ACP

KS phylogeny, PT phylogenya, characterization of additional domains

SMURF [45], antiSMASH [46], NaPDoS [47]

Multimodular (non-iterative)

cis-AT

KS AT ACP

antiSMASH [46], NaPDoS [47], NP.searcher [48], ClustScan [49], CLUSEAN [50]

trans-AT

KS ACP; AT

Number of modules, AT phylogeny for substrate specificity, nature of domains within modules/ conserved sequence motives KS phylogeny for substrate specificity, nature of domains within modules/ conserved sequence motives Homology-based analysis of PKS proteins

PKMiner [51]

Assessing of conserved active side residues

NRPS-PKS [52], antiSMASH [46]

KSa; KSb; ACP

antiSMASH [46], NaPDoS [47], NP.searcher [48]

NR, non-reducing; PR, partially reducing; HR, highly reducing; KS, ketosynthase (type I PKS); AT, acyltransferase; PT, product template domain; ACP, acyl carrier protein; KR, ketoreductase; DH, dehydratase; ER, enoylreductase; KSa, ketosynthase (type II PKS); KSb, chain length factor; discrete enzymes are separated by semicolons. a Only valid for NR-PKS. b Selection of algorithms for polyketide gene cluster analysis and product structure prediction. For a detailed review on bioinformatic tools for cluster annotation and structure prediction see Boddy [53]. The type of polyketide synthase can be determined by anti-SMASH [46], NP.searcher [48] and NaPDoS [47]. Polyketide biosynthetic clusters can be annotated with anti-SMASH [46] and NP.searcher software [48].

features that had prevented reliable prediction of structures [6]. However, phylogenetic analysis of KSs from trans-AT-PKSs revealed a correlation between sequences of ketosynthase (KS) domains, which catalyze polyketide chain elongation, and their substrate specificity [7]. This correlation was first employed to predict the structure of thailandamide (2). Guided by the characteristic absorption of the predicted polyene structure, the highly unstable thailandamide A was purified [7]. The correlation between KS sequence and substrate specificity found for trans-AT-PKS complements the colinearity Current Opinion in Biotechnology 2014, 29:107–115

rules described for the prediction of cis-AT polyketides [1,8].

Directing metabolite flux into polyketide biosynthesis In many cases, biosynthetic gene clusters are silent under standard culturing conditions or their corresponding metabolites are only produced in small amounts [3]. As a result, the compounds are often overlooked by standard analytical methods [3]. Several methods for the activation of such gene clusters have been developed. Acetyl-CoA, a www.sciencedirect.com

Polyketide genome mining Helfrich, Reiter and Piel 109

Figure 1

genome mining

homology-based screening

bioinformatics

KS

PCR targeting

[16-21]

structure/ functional/ physico-chemical property prediction [11-14]

[9,10,14,15]

[14,15,20,21]

[7,16,18,19, 31,35,43]

bottom-up genome mining

engineering independent (OSMAC, physicochemical property directed

engineering

top-down genome mining

[5,16,25,28,36,37]

pathway inactivation

pathway activation

metabolic engineering

heterologous expression

cultivation and extraction [25]

[31]

[24,32]

[28]

novel data analysis MS/NMR/SCD and processing techniques algorithms

DANS (complex mixtures)

[22-24,44]

[5]

[26,27]

analytics

[7,10-14,16, 18-22,33-37]

traditional LC-MS and NMR

comparative metabolomics (peak finding algorithms)

isotope enrichment

MS sequence tags

polyketide of interest Current Opinion in Biotechnology

Strategies for the in silico-guided identification of polyketides by genome mining and the experimental-guided connection of chemotype and genotype by homology-based mining approaches. Engineering and analytic strategies can be used in any combination. Numbers indicate the paper referenced in this review that employed the strategy to link the gene cluster identified by genome mining with its corresponding polyketide or vice versa. OSMAC, one strain many compounds [43]; DANS, differential analysis of 2D NMR spectra; SCD, X-ray single crystal diffraction [44]. www.sciencedirect.com

Current Opinion in Biotechnology 2014, 29:107–115

110 Cell and pathway engineering

Figure 2

N

OH O

O

O

H

*

OH * S

O

HO O

O (3)

HO

O

O

(1)

N

O

HO

O (6)

H N

O

HN O

OH

O

(7)

O

O

O O

O H

H

HO HO

OH

O

O

O

NH

CI

OH

HO

OH

NH OO N

O

O

OH

OH OH O OH O (9)

R

*

NH

(8)

HN

OH O OH

* OH OH

O

OH O

* *

*

OH OH

HO

HO

HO

O *

O

*

HO * *

*

(4) O

*

HO

HO

O

*

OH

O

(5)

OH

* OH

*

O

OH OH

O

*

OH

O

O

O

*

HO * O

O

OH OH

*

* *

HO

N

** OH

HN

OH

N2 O OH

O

OH OH

HO

O O

O

(2)

O

*

O OH

O

HN O HO

O

OH

O

O O

O

OH

HO

OH

O O OH OH

O

H N

HO

O

H N

OH O

O

(12)

OH R= H or CI

HO

O

COOH

(10) OH

HN

HO

NH O (11)

O

O

OH

H N

O

HO

OH O * * *

O

N * *

OH NH2

HO

*

O

(13)

O

OH * O

HO O O

O

O

O

O

H N

HO O

O (15)

* OH

OH OH

(14)

S

H

N

H OH S

N H S

O

N H

O OH

O

O O

OH

O N

OH

OH (17)

(16) (18) Current Opinion in Biotechnology

Overview of polyketides discussed in this review. Bold numbers refer the reader to the section in the review that discusses the isolation of the respective metabolite. *Undefined stereogenic center.

Current Opinion in Biotechnology 2014, 29:107–115

www.sciencedirect.com

Polyketide genome mining Helfrich, Reiter and Piel 111

common precursor of polyketide biosynthesis, is a key intermediate in the metabolic network of all cells. It is involved in primary metabolic pathways such as glycolysis, fatty acid, amino acid biosynthesis, and secondary metabolite production. As a consequence, the proportion of acetyl-CoA that is destined for secondary metabolite production limits polyketide biosynthesis. An understanding of the link between primary and secondary metabolism and its manipulation can help to remodel the yields of cryptic polyketides [9]. For example, application of FabI-inhibiting small molecules that suppress fatty acid biosynthesis in favor of enhanced polyketide production has been shown to be effective in a variety of actinomycetes [9]. In addition to such methods that are not requiring genetic manipulation, microorganisms can be genetically deprived of competing metabolic pathways. This approach was used for the targeted activation of the orphan cpk gene cluster in S. coelicolor [10]. The combination of targeted knock-outs of competing secondary metabolite biosynthetic pathways in combination with RNA polymerase engineering to increase promoter binding resulted in enhanced production of the highly unusual polyketide alkaloid coelimycin P (3) associated with the cpk gene cluster [10]. Another strategy is the targeted awakening of silent orphan biosynthetic gene clusters by either constitutively overexpressing or knocking out pathway-specific regulators [11,12,13]. The activation of a giant 150 kb cis-AT PKS gene cluster in Streptomyces ambofaciens by overexpressing the native LAL regulator is an impressive example of how an understanding of regulatory processes can facilitate the activation of a gene cluster [13]. In this way, the production of the remarkably large 51-membered macrolide stambomycin (4), which shows antiproliferative activities, was triggered [13]. In case a microorganism is not amenable to genetic engineering or the bacterium is uncultured, heterologous expression of orphan gene clusters can provide access to compounds. Facilitated by developments in the field of synthetic biology, much progress has been made in optimizing suitable host strains for heterologous expressions to create ‘superhosts’ for polyketide biosynthesis. Streptomyces avermitilis is an inherently talented polyketide producer that is used for the industrial production of anthelmintic avermectin-type polyketides. Minimizing the genome of S. avermitilis by removing native secondary metabolite gene clusters resulted in a talented host for the heterologous expression of secondary metabolites. The introduction of the pld gene cluster from Streptomyces platensis in combination with the overexpression of a regulatory gene encoded in the introduced gene cluster enabled the heterologous production of the antitumor macrolide pladienolide (5) [14]. The model streptomycete S. coelicolor, a prolific secondary metabolite producer, is used as another ‘superhost’ for heterologous expression studies. The deletion of four endogenous secondary metabolite www.sciencedirect.com

gene clusters removed competitive carbon sinks, simplified metabolite profiles and deprived S. coelicolor of its natural antibacterial activity [15]. Additional RNA polymerase and ribosome engineering to pleiotropically increase natural product production further enhanced the properties of the engineered S. coelicolor strain [15].

Homology-based mining The tremendous progress of next-generation sequencing in terms of speed, quality, and affordability provided access to a large number of sequenced genomes that can be mined for their polyketide biosynthetic potential. Complementing whole-genome sequencing, several homology-based PCR pre-screening strategies were used to evaluate the biosynthetic potential of strain collections or metagenomes prior to sequencing [16–21]. On the basis of conserved motifs in genes that are involved in polyketide assembly, the biosynthetic potential of an organism can be investigated independently of prior sequence information. Such motifs in KS sequences were recently used to develop PCR-based strategies to target KS sequences in strain collections, environmental DNA and complex sponge microbiomes [16,17,20,21]. A PCR-based transcriptome mining strategy for actively transcribed KS domains in Streptomyces flaveolus led to the identification of mRNAs coding for the biosynthesis of sanglifehrins (6) and mycotrienins (MYC) (7). The metabolites were isolated and the obtained RNA sequences used to identify the biosynthetic gene clusters from a genomic DNA library [16]. MYCs belong to the large ansamycin family and possess potent antitumor and antibacterial properties [16]. The KSb gene of type II PKS systems provides information on the chain length and cyclization pattern of the polyketide [20]. By screening libraries generated from environmental DNA samples for KSb sequences, several cosmids were identified that harbor type II PKS gene clusters [20,21]. Phylogenetic analysis of the obtained KSb sequences suggested that the encoded KSb proteins are involved in the biosynthesis of putative novel polyketide structures. Heterologous expression of the identified type II PKS gene clusters resulted in the characterization of several novel polyketides both with novel carbon skeletons (erdacin (8) and UT-X26 (9)) as well as polyketides with rare carbon skeletons and promising bioactivities against MRSA and VRE (10) [20,21]. Any other conserved gene involved in polyketide biosynthesis can be targeted in this manner [18,19].

Next-generation analytics Complementing the advances in bioinformatics and genetic engineering, much progress has been made in increasing analytical sensitivity as well as developing new analytical techniques, computational data analysis, and processing algorithms [22–25]. Several algorithms have been established for the identification of low-abundant molecules in LC-HRMS analysis. One example is the Current Opinion in Biotechnology 2014, 29:107–115

112 Cell and pathway engineering

differential analysis of extracts from wild-type Myxococcus xanthus and knockout mutants of every secondary metabolite gene cluster using the combination of a compoundfinding algorithm and principle component analysis [25]. This approach led to the identification of a non-ribosomal peptide, a polyketide-sugar hybrid, and the NRPS-PKS hybrid myxoprincomide [25]. Exploiting the logic for natural product biosynthesis in combination with mass spectrometry method development led to the creation of workflows to link molecules to their respective gene cluster (Figure 1) [26,27]. These methods complement the genotype-to-phenotype strategy pursued in genome mining studies and can be readily applied to discover PKSs and their corresponding polyketides that form hybrids with NRPSs or that encode auxiliary genes involved in glycosylation. Tandem MS of metabolites that harbor amino acids or sugar moieties show characteristic fragmentation patterns that can provide small peptide or glycosyl search tags. With the observed tags the genome can subsequently be bioinformatically screened for genes that code for matching NRPS or glycosylation enzymes. These methods, termed peptidogenomics [26] or glycogenomics [27], respectively, are therefore powerful strategies to directly correlate metabolites with their corresponding gene clusters. Both methods allow the simultaneous annotation of multiple molecules, show potential for automation and might be combined with another recently developed method, molecular networking [24,26,27]. Molecular networking assists one to cope with the flood of tandem MS spectra obtained from data-dependent MS experiments [24]. Detected molecules are visualized as a molecular network that shows chemical families of related molecules as interconnected groups [24]. The obtained network can be mined for molecules that harbor characteristic chemical moieties, such as amino acids or carbohydrates. The increase in sensitivity in NMR spectroscopy has facilitated structure elucidation of low-abundant compounds [22,23]. Structure elucidation of the highly unstable trans-AT PKS-derived polyketide antibiotic bacillaene (11) is an excellent example for the capacities of NMR-based comparative metabolomics. 11 is the product of a biosynthetic gene cluster found in several Bacillus species [28]. Differential analysis of 2D NMR spectra (DANS) of largely unfractionated extracts from the wild-type producer and a PKS knockout mutant was used to determine the structure of bacillaene [28]. By such comparison, the complexity of the 2D NMR spectra could be reduced to signals derived from bacillaene [28].

Neglected polyketide producers Well-known bacterial producers, such as Actinomycetes, Myxobacteria, Cyanobacteria, Bacillus and Pseudomonas species comprise only a fraction of the prokaryotic world [29,30]. Genome mining studies have prompted the reevaluation of well-known microorganisms as well as Current Opinion in Biotechnology 2014, 29:107–115

poorly studied microorganisms as polyketide producers [4]. Even though the vast majority of known bacterial secondary metabolite producers are free-living, evidence is accumulating that bacteria living in symbiosis are chemically prolific. The hypothesis that many of their metabolites might have specifically evolved in the context of symbioses (e.g. for host protection) and might therefore be difficult to find in free-living bacteria increases the potential value of symbiotic bacteria for discovering chemical novelty [7,31]. The related hypothesis that the prolific repertoire of natural products in bacteriosponges originates from their respective symbiotic community has been held for a long time. Only recently, single-cell-genomics and metagenomic studies have revealed the symbiotic bacterium ‘Entotheonella factor TSY1’, a member of the candidate genus ‘Entotheonella’, as the source of almost all polyketides and peptides known from the chemically diverse sponge Theonella swinhoei type yellow, as well as several unknown metabolites. ‘Entotheonella’ phylotypes have been shown to be widely distributed in marine sponges, thus making them a potential target for the discovery of novel polyketides [32]. An unexpected finding is the identification of at least three species of cone snails that contain neurotoxic polyketides in their mucus, which were shown to be produced by the facultative bacterial symbiont Nocardiopsis alba CR167 [33]. This is noteworthy, since well-defended mollusks with thick shells have commonly been believed to be deficient of secondary metabolites. The family of pederin-like substances (12) is a class of cytotoxic polyketides produced by a taxonomically extraordinarily diverse collection of bacteria using largely orthologous trans-AT PKS genes [8]. Members of this natural product family are exclusively known to be produced by symbiotic bacteria associated with a variety of remarkably different eukaryotic hosts and show low structural overlap with secondary metabolites from free-living bacteria. Nosperin (13) and diaphorin (14) are the latest additions to this family of polyketides [31,34]. Diaphorin has been discovered in an unusual obligate defensive symbiont, the b-proteobacterium ‘Candidatus Profftella armatura’, of the Asian citrus psyllid Diaphorina citri [34]. Diaphorin biosynthetic genes comprise 15% of the highly reduced genome (>0.5 Mb) of the bacterium. Nosperin is produced by an enzymatic mosaic of trans-AT and cis-AT PKS elements in a lichen-associated Nostoc sp. cyanobacterium [31]. This is one of the few known cases of lichen-derived polyketides that are produced by the bacterial photobiont rather than the fungal partner [31]. Initial attempts to identify the compound were hampered by low amounts of substance. In order to detect the biosynthetic product of the identified transAT-PKS gene cluster, an isotope enrichment strategy www.sciencedirect.com

Polyketide genome mining Helfrich, Reiter and Piel 113

using 13C-labeled bicarbonate fed to the cultivated symbiont was applied to increase sensitivity for subsequent NMR-guided isolation [31]. The crude extract was subjected to repetitive rounds of LC-solid-phase extraction (SPE) NMR analysis. Detection of the predicted structural moieties eventually resulted in the discovery of 13 [31]. In addition to such mutualistic bacteria, pathogens were recently recognized by genome analyses as potential polyketide producers [29]. For example, a subdivision of the genus Burkholderia that plays significant roles in highly diverse ecological niches, the B. mallei group, contains the human and animal pathogens B. mallei and B. pseudomallei, for which the first polyketide, burkholderic acid (=malleilactone) (15), was recently described [11,12]. The burkholderic acid gene cluster is widely distributed in this group. The proposed biosynthesis includes several unusual reactions such as a headto-head fusion of two polyketide chains as well as the incorporation of a propionyl moiety derived from methionine [11]. The discovery of burkholderic acid, showing both cytotoxic and antibacterial activity, adds to the few nonsiderophore PKS and NRPS products known from human pathogenic bacteria [11,12]. Approximately 3.4% of the 6.7 Mb genome of the B. thailandensis strain E264 code for a total of eleven PKS, NRPS or hybrid clusters, including burkholderic acid. This and the fact that it is used as model strain for its more virulent relatives make it particularly interesting [4,35]. A selection of polyketides from further pathogenic members of the genus Burkholderia as well as other pathogenic bacteria is listed in Table 2 [11,12,36–39]. Although apathogenic, B. thailandensis MSMB43 is worth mentioning as it harbors the biosynthetic gene cluster for the recently discovered thailanstatins A–C. Thailanstatin A (16) shows the same splicesosome inhibiting properties as comparable compounds with improved physiological properties [35]. Ralstonia solanacearum, of the order Burkholderiales, produces the polyketide micacocidin (17), a siderophore that differs from the related yersiniabactin in the moieties

that confer the iron-chelating properties [37]. Another example for pathogenic polyketide sources are Legionella species. The fluorescent trans-AT PKS product legioliulin (18) was recently described as the first characterized polyketide [39] from these bacteria. Anaerobic bacteria are ubiquitously distributed and play important roles in ecology, industry, and human health [28,40]. However, these organisms have only recently been recognized as a source of natural products with the identification of the antibiotic closthioamide from Clostridium cellulolyticum H10, an important industrial strain. The compound shows promising activity against multiresistant bacteria. Although closthioamide is not a polyketide, it represents an important milestone in using anaerobic bacteria as discovery resource. Genome analysis indeed revealed the presence of numerous orphan biosynthetic PKS gene clusters in such organisms [41], indicating significant potential for polyketide production. C. cellulolyticum is especially remarkable, since some of the clusters appear to be conserved among other clostridia but show no resemblance to characterized biosynthetic gene clusters [4]. Another example for unusual PKSs from the order Clostridiales is encoded in the genome of Ruminococcus albus 7, which carries a PKS/ NRPS hybrid cluster and is unusual in that it shows a lack in recognizable AT domains and is devoid of trans-AT genes [4]. A recent comprehensive study on the distribution of modular type I PKS within all publically available whole genome sequences and whole genome shotgun draft contig sequences has revealed 885 non-redundant PKS gene clusters, the majority of which (80%) displaying orphan gene clusters that are found in well-known talented polyketide producers and a wide range of poorly studied bacteria [42]. Cluster-wide sequence comparison has furthermore shown that most orphan gene clusters are dissimilar to known PKSs and are therefore promising candidates for the identification of novel structural scaffolds and biological activities [42].

Table 2 Overview of selected polyketides from pathogenic microorganism. Producer Burkholderia Burkholderia Burkholderia Burkholderia

mallei pseudomallei thailandensis E264 thailandensis MSMB43

Burkholderia ambifaria AMMD Ralstonia solanacearum Burkholderia gladioli Legionella parisiensisa Legionella dumoffii a

Compound

Bioactivity

PKS

Ref

Burkholderic acid/malleilactone (15)

Cytotoxic anti-bacterial

[11,12]

Thailandamide A (2) Thailanstatins A-C (16)

Not reported Pre-mRNA splicing inhibiting Anti-bacterial Siderophore Respiratory toxin Fluorophore

cis-AT, trans-AT PKS/NRPS hybrid trans-AT PKS trans-AT PKS/NRPS hybrid cis-AT/trans-AT PKS PKS/NRPS hybrid trans-AT PKS trans-AT PKS

[36] [37] [38] [39]

Enacyloxins Micacocidin Bongkrekic acid Legioliulin (18)

[7] [35]

Considered nonpathogenic but also isolated from a pneumonia patient [39].

www.sciencedirect.com

Current Opinion in Biotechnology 2014, 29:107–115

114 Cell and pathway engineering

Conclusion Next-generation sequencing technologies are providing access to large numbers of novel PKS genes at an impressive pace. These genes are being found in seemingly wellstudied bacteria as well as in microorganisms that were previously neglected as natural product sources, such as pathogens, mutualists, anaerobes, or uncultivated bacteria. In recent years, increasingly sophisticated methods have become available that allow one to conduct genomebased polyketide discovery in a more systematic fashion. These methods include bioinformatic tools to predict polyketide core structures, strategies for genetic and metabolic engineering to awaken and assign orphan biosynthetic gene clusters, and analytical tools for detection and structure elucidation. These developments have paved the way to the disclosure of many novel polyketide structures in the last decade, suggesting that most cryptic pathways are indeed functional and encode metabolic novelty. Further advancing these discovery strategies, alongside with the exploration of new taxa or ecological niches, might lead to a significant surge in natural product numbers. Genetic tools for new talented producers and the development of robust heterologous expression systems are needed to keep up with the predicted biosynthetic potential. Overcoming these bottleneck might lead to a repositioning of natural product research in pharmaceutical industries.

Acknowledgement We would like to thank Micheal C. Wilson for constructive discussions.

References and recommended reading Papers of particular interest, published within the period of review, have been highlighted as:  of special interest  of outstanding interest 1.

Hertweck C: The biosynthetic logic of polyketide diversity. Angew Chem Int Ed 2009, 48:4688-4716.

2.

Zerikly M, Challis GL: Strategies for the discovery of new natural products by genome mining. ChemBioChem 2009, 10:625-633.

3.

Winter JM, Behnken S, Hertweck C: Genomics-inspired discovery of natural products. Curr Opin Chem Biol 2011, 15:2231.

4. 

Letzel AC, Pidot SJ, Hertweck C: A genomic approach to the cryptic secondary metabolome of the anaerobic world. Nat Prod Rep 2013, 30:392-428. The authors give a comprehensive account of polyketide and non-ribosomal peptide synthesis potential of anaerobes. 5.

Kersten RD, Lane AL, Nett M, Richter TK, Duggan BM, Dorrestein PC, Moore BS: Bioactivity-guided genome mining reveals the lomaiviticin biosynthetic gene cluster in Salinispora tropica. ChemBioChem 2013, 14:955-962.

6.

Piel J: Biosynthesis of polyketides by trans-AT polyketide synthases. Nat Prod Rep 2010, 27:996-1047.

7.

Nguyen T, Ishida K, Jenke-Kodama H, Dittmann E, Gurgui C, Hochmuth T, Taudien S, Platzer M, Hertweck C, Piel J: Exploiting the mosaic structure of trans-acyltransferase polyketide synthases for natural product discovery and pathway dissection. Nat Biotechnol 2008, 26:225-233.

Current Opinion in Biotechnology 2014, 29:107–115

8.

Challis GL: Genome mining for novel natural product discovery. J Med Chem 2008, 51:2618-2628.

9.

Craney A, Ozimok C, Pimentel-Elardo SM, Capretta A, Nodwell JR: Chemical perturbation of secondary metabolism demonstrates important links to primary metabolism. Chem Biol 2012, 19:1020-1027.

10. Gomez-Escribano JP, Song LJ, Fox DJ, Yeo V, Bibb MJ, Challis GL: Structure and biosynthesis of the unusual  polyketide alkaloid coelimycin P1, a metabolic product of the cpk gene cluster of Streptomyces coelicolor M145. Chem Sci 2012, 3:2716-2720. Depriving the genome of its naturally competing biosynthetic gene clusters and polymerase engineering resulted in the isolation of the structurally highly unusual coelimycin P1. 11. Franke J, Ishida K, Hertweck C: Genomics-driven discovery of burkholderic acid, a noncanonical, cryptic polyketide from human pathogenic Burkholderia species. Angew Chem Int Ed 2012, 51:11611-11615. 12. Biggins JB, Ternei MA, Brady SF: Malleilactone, a polyketide synthase-derived virulence factor encoded by the cryptic secondary metabolome of Burkholderia pseudomallei group pathogens. J Am Chem Soc 2012, 134:13192-13195. 13. Laureti L, Song L, Huang S, Corre C, Leblond P, Challis GL, Aigle B: Identification of a bioactive 51-membered macrolide complex  by activation of a silent polyketide synthase in Streptomyces ambofaciens. Proc Natl Acad Sci U S A 2011, 108:6258-6263. Activation of a 150 kb cryptic polyketide gene cluster led to the identification of a 51-membered macrolide polyketide. 14. Komatsu M, Uchiyama T, Omura S, Cane DE, Ikeda H: Genomeminimized Streptomyces host for the heterologous expression of secondary metabolism. Proc Natl Acad Sci U S A 2010, 107:2646-2651. 15. Gomez-Escribano JP, Bibb MJ: Engineering Streptomyces  coelicolor for heterologous expression of secondary metabolite gene clusters. Microbial Biotechnol 2011, 4:207-215. Removal of native secondary metabolite gene clusters and RNA polymerase and ribosome engineering results in an Streptomyces superhost. 16. Qu X, Lei C, Liu W: Transcriptome mining of active biosynthetic pathways and their associated products in Streptomyces flaveolus. Angew Chem Int Ed 2011, 50:9651-9654. 17. Fisch KM, Gurgui C, Heycke N, van der Sar SA, Anderson SA, Webb VL, Taudien S, Platzer M, Rubio BK, Robinson SJ et al.: Polyketide assembly lines of uncultivated sponge symbionts from structure-based gene targeting. Nat Chem Biol 2009, 5:494-501. 18. Funabashi M, Baba S, Takatsu T, Kizuka M, Ohata Y, Tanaka M, Nonaka K, Spork AP, Ducho C, Chen WC et al.: Structure-based gene targeting discovery of sphaerimicin, a bacterial translocase I inhibitor. Angew Chem Int Ed 2013, 52:1160711611. 19. Rateb ME, Houssen WE, Arnold M, Abdelrahman MH, Deng H, Harrison WT, Okoro CK, Asenjo JA, Andrews BA, Ferguson G et al.: Chaxamycins A–D, bioactive ansamycins from a hyperarid desert Streptomyces sp.. J Nat Prod 2011, 74:1491-1499. 20. Feng Z, Kallifidas D, Brady SF: Functional analysis of environmental DNA-derived type II polyketide synthases  reveals structurally diverse secondary metabolites. Proc Natl Acad Sci U S A 2011, 108:12629-12634. A culture independent approach to discover structurally unique natural products. 21. King RW, Bauer JD, Brady SF: An environmental DNA-derived  type II polyketide biosynthetic pathway encodes the biosynthesis of the pentacyclic polyketide erdacin. Angew Chem Int Ed 2009, 48:6257-6261. See annotation to Ref [20]. 22. Forseth RR, Schroeder FC: NMR-spectroscopic analysis of mixtures: from structure to function. Curr Opin Chem Biol 2011, 15:38-47. 23. Molinski TF: NMR of natural products at the ‘nanomole-scale’. Nat Prod Rep 2010, 27:321-329. www.sciencedirect.com

Polyketide genome mining Helfrich, Reiter and Piel 115

24. Watrous J, Roach P, Alexandrov T, Heath BS, Yang JY,  Kersten RD, van der Voort M, Pogliano K, Gross H, Raaijmakers JM et al.: Mass spectral molecular networking of living microbial colonies. Proc Natl Acad Sci U S A 2012, 109:1743-1752. The combination of nanoDESI and molecular networking allow the metabolic profiling of living microbes, the visualization and organization of large tandem MS data sets, prioritization, detection, annotation and comparison of metabolites from different data sets. 25. Cortina NS, Krug D, Plaza A, Revermann O, Muller R: Myxoprincomide: a natural product from Myxococcus xanthus discovered by comprehensive analysis of the secondary metabolome. Angew Chem Int Ed 2012, 51:811-816. 26. Kersten RD, Yang YL, Xu YQ, Cimermancic P, Nam SJ, Fenical W,  Fischbach MA, Moore BS, Dorrestein PC: A mass spectrometryguided genome mining approach for natural product peptidogenomics. Nat Chem Biol 2011, 7:794-802. Peptidogenomics [this paper] and glycogenomics [27] are novel genome mining strategies that allow the rapid characterization and annotation of secondary metabolites to their corresponding biosynthetic gene cluster. 27. Kersten RD, Ziemert N, Gonzalez DJ, Duggan BM, Nizet V,  Dorrestein PC, Moore BS: Glycogenomics as a mass spectrometry-guided genome-mining method for microbial glycosylated molecules. Proc Natl Acad Sci U S A 2013. See annotation to Ref [26]. 28. Butcher RA, Schroeder FC, Fischbach MA, Straight PD, Kolter R, Walsh CT, Clardy J: The identification of bacillaene, the product of the PksX megacomplex in Bacillus subtilis. Proc Natl Acad Sci U S A 2007, 104:1506-1509. 29. Pidot SJ, Coyne S, Kloss F, Hertweck C: Antibiotics from neglected bacterial sources. Int J Med Microbiol 2014, 304:1422. 30. Zotchev SB, Sekurova ON, Katz L: Genome-based bioprospecting of microbes for new therapeutics. Curr Opin Biotechnol 2012, 23:941-947. 31. Kampa A, Gagunashvili AN, Gulder TAM, Morinaka BI, Daolio C, Godejohann M, Miao VPW, Piel J, Andresson OS: Metagenomic natural product discovery in lichen provides evidence for a family of biosynthetic pathways in diverse symbioses. Proc Natl Acad Sci U S A 2013, 110:3129-3137. 32. Wilson MC, Mori T, Ru¨ckert C, Uria AR, Helf MJ, Takada K,  Gernert C, Steffens U, Heycke N, Schmitt S, Rinke C, Helfrich EJN, Brachmann AO, Gurgui C, Wakimoto T, Kracht M, Cru¨semann M, Hentschel U, Abe I, Matsunaga S, Kalinowski J, Takeyama H, Piel J: An environmental bacterial taxon with a large and distinct metabolic repertoire. Nature 2014, 506:58-62. This paper provides first time proof that the chemical diversity (among others several known and cryptic polyketide gene clusters) of a marine sponge is almost exclusively produced by a single bacterium. Furthermore a new candidate phylum with a possibly large biosynthetic potential is proposed. 33. Lin Z, Torres JP, Ammon MA, Marett L, Teichert RW, Reilly CA, Kwan JC, Hughen RW, Flores M, Tianero MD et al.: A bacterial source for mollusk pyrone polyketides. Chem Biol 2013, 20:73-81. 34. Nakabachi A, Ueoka R, Oshima K, Teta R, Mangoni A, Gurgui M, Oldham NJ, van Echten-Deckert G, Okamura K, Yamamoto K et al.: Defensive bacteriome symbiont with a drastically reduced genome. Curr Biol 2013, 23:1478-1484. 35. Liu X, Biswas S, Berg MG, Antapli CM, Xie F, Wang Q, Tang MC, Tang GL, Zhang L, Dreyfuss G et al.: Genomics-guided discovery of thailanstatins A, B, and C as pre-mRNA splicing inhibitors and antiproliferative agents from Burkholderia thailandensis MSMB43. J Nat Prod 2013, 76:685-693. 36. Mahenthiralingam E, Song L, Sass A, White J, Wilmot C, Marchbank A, Boaisha O, Paine J, Knight D, Challis GL: Enacyloxins are products of an unusual hybrid modular polyketide synthase encoded by a cryptic Burkholderia ambifaria genomic island. Chem Biol 2011, 18:665-677.

www.sciencedirect.com

37. Kreutzer MF, Kage H, Gebhardt P, Wackler B, Saluz HP, Hoffmeister D, Nett M: Biosynthesis of a complex yersiniabactin-like natural product via the mic locus in phytopathogen Ralstonia solanacearum. Appl Environ Microbiol 2011, 77:6117-6124. 38. Mo¨bius N, Ross C, Scherlach K, Rohm B, Roth M, Hertweck C: Biosynthesis of the respiratory toxin bongkrekic acid in the pathogenic bacterium Burkholderia gladioli. Chem Biol 2012, 19:1164-1174. 39. Ahrendt T, Miltenberger M, Haneburger I, Kirchner F, Kronenwerth M, Brachmann AO, Hilbi H, Bode HB: Biosynthesis of the natural fluorophore legioliulin from Legionella. ChemBioChem 2013, 14:1415-1418. 40. Behnken S, Hertweck C: Anaerobic bacteria as producers of antibiotics. Appl Microbiol Biotechnol 2012, 96:61-67. 41. Lincke T, Behnken S, Ishida K, Roth M, Hertweck C: Closthioamide: an unprecedented polythioamide antibiotic from the strictly anaerobic bacterium Clostridium cellulolyticum. Angew Chem Int Ed 2010, 49:2011-2013. 42. O’Brien RV, Davis RW, Khosla C, Hillenmeyer ME: Computational identification and analysis of orphan assembly-line polyketide synthases. J Antibiot 2013, 67:89-97. 43. Bode HB, Bethe B, Hofs R, Zeeck A: Big effects from small changes: possible ways to explore nature’s chemical diversity. ChemBioChem 2002, 3:619-627. 44. Inokuma Y, Yoshioka S, Ariyoshi J, Arai T, Hitora Y, Takada K, Matsunaga S, Rissanen K, Fujita M: X-ray analysis on the nanogram to microgram scale using porous complexes. Nature 2013, 495:461-466. 45. Khaldi N, Seifuddin FT, Turner G, Haft D, Nierman WC, Wolfe KH, Fedorova ND: SMURF: genomic mapping of fungal secondary metabolite clusters. Fungal Genet Biol 2010, 47:736-741. 46. Blin K, Medema MH, Kazempour D, Fischbach MA, Breitling R,  Takano E, Weber T: antiSMASH 2.0 — a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res 2013, 41:W204-W212. antiSMASH is a web-based tool for the rapid annotation and analysis of secondary metabolite biosynthesis gene clusters in both bacterial and fungal genomes 47. Ziemert N, Podell S, Penn K, Badger JH, Allen E, Jensen PR: The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS ONE 2012, 7:e34064. 48. Li MH, Ung PM, Zajkowski J, Garneau-Tsodikova S, Sherman DH: Automated genome mining for natural products. BMC Bioinformatics 2009, 10:185. 49. Starcevic A, Zucko J, Simunkovic J, Long PF, Cullum J, Hranueli D: ClustScan: an integrated program package for the semiautomatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures. Nucleic Acids Res 2008, 36:6882-6892. 50. Weber T, Rausch C, Lopez P, Hoof I, Gaykova V, Huson DH, Wohlleben W: CLUSEAN: a computer-based framework for the automated analysis of bacterial secondary metabolite biosynthetic gene clusters. J Biotechnol 2009, 140:13-17. 51. Kim J, Yi GS: PKMiner: a database for exploring type II polyketide synthases. BMC Microbiol 2012, 12:169. 52. Ansari MZ, Yadav G, Gokhale RS, Mohanty D: NRPS-PKS: a knowledge-based resource for analysis of NRPS/PKS megasynthases. Nucleic Acids Res 2004, 32:W405-W413. 53. Boddy CN: Bioinformatics tools for genome mining of polyketide and nonribosomal peptides. J Ind Microbiol Biotechnol 2013, 41:443-450.

Current Opinion in Biotechnology 2014, 29:107–115

Recent advances in genome-based polyketide discovery.

Polyketides are extraordinarily diverse secondary metabolites of great pharmacological value and with interesting ecological functions. The post-genom...
2MB Sizes 1 Downloads 4 Views