YGENO-08610; No. of pages: 12; 4C: Genomics xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Genomics journal homepage: www.elsevier.com/locate/ygeno

F

a

1 1

a r t i c l e

12 13 14 15

Article history: Received 24 March 2014 Accepted 5 May 2014 Available online xxxx

16 17 18 19 20

Keywords: Drug targets prioritization Clostridium botulinum In silico analysis Interactome

Department of Microbiology, Quaid-I-Azam University Islamabad, Pakistan Indiana Center for Systems Biology and Personalized Medicine, Indiana University–Purdue University, Indianapolis, IN, USA School of Informatics, Indiana University, Indianapolis, IN, USA d Department of Computer & Information Science, School of Science, Purdue University, Indianapolis, IN, USA e Department of Microbiology and Immunology, Indiana University School of Medicine, Indianapolis, IN, USA f Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan

R O

b

O

5 6 7 8 9 10

i n f o

P

c

a b s t r a c t

A computational and in silico system level framework was developed to identify and prioritize the antibacterial drug targets in Clostridium botulinum (Clb), the causative agent of flaccid paralysis in humans that can be fatal in 5 to 10% of cases. This disease is difficult to control due to the emergence of drug-resistant pathogenic strains and the only available treatment antitoxin which can target the neurotoxin at the extracellular level and cannot reverse the paralysis. This study framework is based on comprehensive systems-scale analysis of genomic sequence homology and phylogenetic relationships among Clostridium, other infectious bacteria, host and human gut flora. First, the entire 2628-annotated genes of this bacterial genome were categorized into essential, non-essential and virulence genes. The results obtained showed that 39% of essential proteins that functionally interact with virulence proteins were identified, which could be a key to new interventions that may kill the bacteria and minimize the host damage caused by the virulence factors. Second, a comprehensive comparative COGs and blast sequence analysis of these proteins and host proteins to minimize the risks of side effects was carried out. This revealed that 47% of a set of C. botulinum proteins were evolutionary related with Homo sapiens proteins to sort out the non-human homologs. Third, orthology analysis with other infectious bacteria to assess broad-spectrum effects was executed and COGs were mostly found in Clostridia, Bacilli (Firmicutes), and in alpha and beta Proteobacteria. Fourth, a comparative phylogenetic analysis was performed with human microbiota to filter out drug targets that may also affect human gut flora. This reduced the list of candidate proteins down to 131. Finally, the role of these putative drug targets in clostridial biological pathways was studied while subcellular localization of these candidate proteins in bacterial cellular system exhibited that 68% of the proteins were located in the cytoplasm, out of which 6% was virulent. Finally, this framework may serve as a general computational strategy for future drug target identification in infectious diseases. © 2014 Elsevier Inc. All rights reserved.

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

drug targets at the genomic level for any given pathogen [4]. Traditionally, targets have been identified through functional knowledge of individual protein molecules, where their function has been well characterized. The cost of research progress in the pharmaceutical industry has been growing steeply and steadily in the last decade, but the amount of time required bringing a new product to market remains around ten to fifteen years [5]. In silico methods have the advantages of speed, low cost and, even more importantly, providing a systems view of the whole microbe. This enables researchers to ask questions that are otherwise difficult to address experimentally. There is a progressive development in drug discovery procedures from conventional ligand based drug discovery to structural and targeted based drug designing approaches by targeting the key molecular pathways of the diseases. Using systems biology concepts and understanding the

52 53

D

4

Syed Aun Muhammad a,b, Safia Ahmed a,⁎, Amjad Ali f,⁎, Hui Huang b, Xiaogang Wu b, X. Frank Yang e, Anam Naz f, Jake Chen b,c,d

E

3Q1

T

2

Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach

41 45 43 42 44

N C O

R

R

E

C

1

1. Introduction

47 48

Drug target identification is the first step in the drug discovery process [1] and the completion of pathogenic bacterial genome sequences has increased the momentum within the field of drug discovery and vaccine target identification [2,3]. Because of the availability of both pathogen and host genome sequences, it has become easier to identify

49 50 51

U

46

⁎ Corresponding authors. E-mail addresses: [email protected] (S.A. Muhammad), safi[email protected] (S. Ahmed), [email protected] (A. Ali), [email protected] (H. Huang), [email protected] (X. Wu), [email protected] (X.F. Yang), [email protected] (A. Naz), [email protected], [email protected] (J. Chen).

http://dx.doi.org/10.1016/j.ygeno.2014.05.002 0888-7543/© 2014 Elsevier Inc. All rights reserved.

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

54 55 56 57 58 59 60 61 62 63 64 65

89 90 91 92 93 94

F

O

R O

87 88

P

85 86

E

83 84

T

81 82

C

79 80

E

77 78

R

75 76

R

73 74

O

72

C

70 71

botulism are caused by home-canned foods while wound botulism has been reported in California and caused by black-tar heroin injection [14]. Though a number of antibiotics are recently available, the rise of drug-resistant pathogenic strains has made C. botulinum difficult to control [15]. Moderate resistance to chloramphenicol, tetracycline, cephalosporins and nalidixic acid [16] and a high level of resistance to cycloserine, nitroimidazole, gentamycin, sulfamethoxazole, and trimethoprim by Clb have been observed [17]. The only available treatment for botulism is administration of botulinum antitoxin, which can arrest the progression of paralysis and decrease its duration. However, antitoxin against the botulism neurotoxins can only target the toxins at the extracellular level and cannot reverse the paralysis caused by botulism. In addition, an antibody against the botulism neurotoxins can cause severe hypersensitivity reactions and is limited to prophylaxis use [18]. Here, we report an exclusive computational and database systemlevel framework to confidently find and prioritize clostridial drug targets that is based on protein- and ortholog-network analyses, genome sequence analyses and evolutionary studies (Fig. 1). This systemslevel paradigm (i) integrates the use of biological network tools and databases with systems-level information about drug targets, (ii) retrospectively and prospectively assesses network-based relationships between essential and virulent proteins, (iii) analyzes the evolutionary diversity and similarity of orthologs between species, (iv) minimizes the risks of disturbing the normal ecosystem of endogenous human gastrointestinal microbial flora by comparative genomic analysis, (v) develops and reports the integrated unique biochemical and functional metabolic pathways, and (vi) predicts subcellular localization of

N

68 69

microbe as a whole opens up new prospects for computational drug target identification [6]. In recent years, the drug discovery strategies are shifting progressively to genomic, proteomic and metabolomic approaches [7] to identify novel drug targets for the design of new defenses against antibioticresistant pathogens [8]. Currently, genomics and, more specifically in silico comparative and functional genomics, are being widely used to identify novel drug and vaccine targets in order to develop effective antibacterial agents and vaccines against bacterial pathogens that are either resistant to existing antibacterial regimens or for which a suitable vaccine is not available [9,10]. It has been proposed that a systems-level analysis of the genes, proteins, and interactions involved is the key to gaining insights into routes required for drug targets and drug resistance. One of the prerequisites of such an analysis is the existence of a comprehensive protein interactome of the relevant pathogen [11]. Botulism, a rare infection caused by Clostridium botulinum (Clb) that produces one of several toxins collectively known as botulinum neurotoxins, the most potent toxins known to man and induce a potentially fatal paralytic condition in humans [12]. The toxin enters the body in one of three ways: colonization of the digestive tract by the bacterium in children (infant botulism) or adults (adult intestinal toxemia), ingestion of toxin from foods (food-borne botulism), or contamination of a wound (wound botulism) by the bacterium. The ratio of this deadly infection is increased if cure is not appropriate and instant [13] and can be fatal in 5 to 10% of cases. Approximately an average of 145 cases of botulism occurs in the U.S. per year and of these around 15% is foodborne botulism, 65% are infant botulism, and 20% are wound botulism (CDC Reports, 2011). In the west of the U.S. Mississippi River, it is most common and approximately half of the cases of food-borne

U

66 67

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

D

2

Fig. 1. Steps involved in target prioritization in C. botulinum type A by computational approach.

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

95 96 97 98 Q3 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 Q4 116 117 118 119 120 121 122 123

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

127 128

Features

Specificity

t1:4 t1:5 t1:6 t1:7 t1:8 t1:9 t1:10 t1:11 t1:12

Total size Total genes Protein coding genes Annotated genes Conserved hypothetical genes True hypothetical genes All RNA tRNA rRNA

3.86 Mb 3700 3594 2628 588 381 106 81 24

157

For screening of essential genes, the Database of Essential Genes (DEG) was accessed [21] from its website, and sequence alignments were performed using BLASTx and BLASTp with each gene and protein sequence of the pathogen. A minimum bit score of 100 and more than 30% identity with an expectation value (E-value) cutoff of 0.0001 was selected for shortlisting the essential genes [22].

158 159

2.4. Constructing protein–protein interaction network

164

A computational based systems-level analysis of the genes, proteins, and interactions is an important step to determining bacterial drug targets [23]. A protein–protein interaction (PPI) network of C. botulinum was constructed for the interpretation of proteomics data. All information regarding interactome analysis was processed, stored and operated by using an open-source database management system, MySQL. Initially, we mapped proteins from the input list onto the reference database. The STRING database version 8.3 [24] was used to construct the entire proteome-scale interaction network of annotated genes (2628 genes) of Clb comprising 1002 nodes and 6752 edges using only the high-confidence data. A sub-network between the essential and virulence proteins containing 147 nodes and 254 edges was derived and analyzed from this entire proteome network. Each node with colors corresponds to the individual proteins in the network while the edges indicate interactions between them. The protein network was visualized by using Cytoscape version 2.8.1 [25]. This network represents a comprehensive view of the connectivity among the various proteins representing essential and virulence nodes, analogous to obtaining the road map of a city.

165 166

probable vaccine and drug targets. In contrast with previously studied approaches [2,3,7,11] the methodology reported in this study highlights the importance of computational systems-level analysis and has the potential to be used as a general strategy for the identification of drug targets.

2.5. Analysis of clusters of orthologous groups of proteins (COGs)

184

A major issue in inferring protein interactions is the determination of orthologs whose function and interaction are highly likely to be conserved from one bacterium to another. The idea behind the COG approach was to generalize, extend the notion of a genome-specific best hit and to assess the level of one-to-one orthology between genomes that is critical for evolutionary and functional genomics [26]. During this analysis, firstly, the COG construction procedure was based on the mapping of C. botulinum essential and virulence genes by using STRING database version 8.3. Secondly, the genome-specific hit was extended to multiple COGs of various species from prokaryotes to eukaryotes (Homo-sapiens) to identify the associated clusters of consistent best hits (N 0.5 STRING score) by using the similar database. The COGs reflected a one-to-many relationship as well as a simple one-toone association. The COGs were examined for their broad diversity and distribution into various species.

185

2.6. COG analysis in H. sapiens

200

O

125 126

t1:3

2.3. Screening of essential genes

R O

124

Table 1 General features of C. botulinum type A genomes.

F

t1:1 t1:2

3

2. Methods

130

2.1. Targeting C. botulinum Type A

131 132

141 142

Target identification is the first key stage to hit C. botulinum type A ATCC 19397, which is a genetically sequenced pathogenic prokaryote [19]. The complete genetic sequence data of this species unveils that there are essential proteins which are potential targets for drug or vaccine discovery. Comparative genomic studies were performed to find the targets with the help of a set of databases and other computational tools, as summarized in Table 3. Therefore, our strategy comprised a series of analysis that started from collection of genomic and proteomic sequences to drug target identification. The methodology has been illustrated in Fig. 1. This scheme can also be used more generally as a target identification pipeline that would be applicable to many drug discovery programs.

143

2.2. Clostridial genome coverage

144 145

156

The annotated and categorized gene and protein sequences of C. botulinum type A ATCC 19397 were downloaded from the PathemaJCVI web database source [20] and retrieved the expressed genes and protein sequences in FASTA format (including the virulence factors). The clostridial genome has delivered a lot of information regarding the overall conservation, role categories and hence the therapeutic potential of antimicrobial targets (broad or narrow spectrum). A circular genome of this species comprises 3,863,450 base pairs (3.86 Mb) with a total number of 3700 genes. Of the 3594 protein coding genes, only 2628 genes have been assigned a role category. The general feature of the C. botulinum type A genome is shown in Table 2. These genes are involved in the coding of essential, non-essential proteins, neurotoxins and other virulence factors.

t2:1 t2:2

Table 2 The Gene ontology and functional annotation of interacting essential and virulence proteins partners (protein–protein interaction) using Uniprot, EMBL-EBI and David functional tools.

146 147 148 149 150 151 152 153 154 155

t2:3

Q2 t2:4 t2:5 t2:6 t2:7 t2:8 t2:9 t2:10

D

E

T

C

E

R

139 140

R

137 138

N C O

135 136

U

133 134

P

129

160 161 162 163

167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183

186 187 188 189 190 191 192 193 194 195 196 197 198 199

As a first measure of the rate of bacterial COG evolution, the data of 201 human annotated proteins corresponding to each COGs were down- 202 loaded and retrieved from the STRING database version 8.3, and they 203

Gene locus

Uniprot_ID

Gene ontology (GO)

EMBL-CDS

Protein name/Functional annotation

CLB_1817 CLB_3445 CLB_2527 CLB_2528 CLB_0846 CLB_0848 CLB_3408

A7FUT6_CLOB1 A7FYY3_CLOB1 A7FWL7_CLOB1 A7FWL8_CLOB1 A7FS61_CLOB1 A7FS63_CLOB1 A7FYU6_CLOB1

GO: 0008168 GO: 0008233 GO: 0009058 GO: 0003824 GO: 0006355 GO: 0051609 –

ABS34836.1 ABS35681.1 ABS33199.1 ABS34693.1 ABS33231.1 ABS33111.1 ABS33416.1

Hemolysin A/methyltransferase activity Carboxyl-terminal protease/repairing Biosynthesis protein/hydrolase activity Biosynthesis protein/catalytic activity Transcriptional regulator/transcription Bontoxilysin A (Neurotoxin)/pathogenesis Membrane protein/ABC transporter

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

4 t3:1 t3:2

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

Table 3 Databases and tools used in target identification of C. botulinum type A ATCC 19397. Databases/software/tools

Accessibility

Utility

References

t3:4 t3:5 t3:6 t3:7 t3:8 t3:9 t3:10 t3:11 t3:12 t3:13 t3:14 t3:15 t3:16

Pathema-JCVI Database of Essential Genes (DEG) STRING database version 8.3 Cytoscape version 2.8.1 Human Microbiome Project (HMP) JGI Genome Portal database MEGA 4.0.2 package Kyoto Encyclopedia of Genes and Genomes (KEGG) CELLO v.2.5 Subcellular Localization Predictor Uniprot EMBL-EBI DAVID Bioinformatics tool National Center for Biotechnology Information (NCBI)

http://pathema.jcvi.org http://tubic.tju.edu.cn/deg/ http://string-db.org/ http://www.cytoscape.org/ http://www.hmpdacc.org/ http://genome.jgi-psf.org/ http://www.megasoftware.net/ http://www.genome.jp/ http://cello.life.nctu.edu.tw/ http://www.uniprot.org/ http://www.ebi.ac.uk/ http://david.abcc.ncifcrf.gov http://blast.ncbi.nlm.nih.gov/

Comparative genomic analyses of pathogenic bacteria For screening of essential genes For known and predicted protein/COGs interaction For network analysis and visualization Comparative human microbiome genetic analysis Integrated Genomic Information Molecular evolutionary genetic analysis software Pathways analysis and comparison Subcellular localization prediction for bacteria and eukaryotes Resource of protein sequence and functional information Data and bioinformatics services Functional Annotation tool Biomedical and genomic information source

[20] [21] [24] [25] [31] [32] [33] [35] [36] [38] [39] [40] –

221

2.8. Comparative analysis to human microbiota

222

A number of organisms are known to inhabit the gut of normal healthy individuals, and they are playing a fundamental role in health and disease [27]. The relationship between gut flora and humans is not merely commensal (a non-harmful coexistence), but rather is a mutualistic, symbiotic relationship [28]. The microorganisms perform a host of useful functions, such as fermenting unused energy substrates, training the immune system, preventing growth of harmful species [29], regulating the development of the gut, producing vitamins for the host (such as biotin and vitamin K), producing hormones to direct the host to store fats and defending against some diseases [30]. Inadvertent inhibition of proteins of these organisms is likely to result in side effects [27]. In order to study this possibility and to screen out these matched proteins, the short listed non-human homolog genes and proteins of C. botulinum were compared to the proteins of the gut flora by sequence analysis. Genetic diversity was compared with already classified phyla of gastrointestinal flora available at the Human Microbiome Project database server [31]. BLASTp option for each protein sequence of nonhuman homologs of this bacterium was run against gut flora protein sequences as a reference with an E-value cutoff of 1 (E = 1) using the similar database server. For the construction of phylogenetic tree using the neighbor-joining method, the complete genome of C. botulinum obtained from the JGI Genome Portal Server [32] was used as input query against gastrointestinal flora as a reference genome using BLASTn with an E-value cutoff of 0.01 (E = 0.01) and a bit score of N100 at the Human Microbiome Project database server. Phylogenetic analysis was carried out using the MEGA 4.0.2 package [33], a multi sequence alignment and molecular evolutionary genetic

229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249

C

E

R

R

O

227 228

C

225 226

N

223 224

U

218 219

O

220

The phyletic patterns of C. botulinum COGs were also examined virtually in other prokaryotes to check the availability of these clusters in other bacterial species. Therefore, non-human associated COGs were mapped with species count data available at the STRING database (downloaded and retrieved on MySQL developer) and finally the phyletic distribution of protein clusters in other bacteria was calculated.

216 217

R O

215

211 212

252

We analyzed the integrated, metabolic and protein interaction network of C. botulinum type A and observed the correlation between these pathways. Identified, essential non-human and non-gut flora homolog genes were mapped in the metabolic pathways in which they are involved, using the KEGG database as a source of metabolic pathway information [35]. For the analysis of individual genes, an entire network of gene ontology was created for cataloging the homologous pathways within the bacteria. This analysis also highlighted those pathways which do not appear in the host but are present in the pathogen.

253 254

2.10. Distribution and prediction of subcellular localization

262

Non-human and non-homolog to gut flora bacterial targets that were crucial in pathways were identified and subsequently analyzed to determine their localization (cytoplasmic, membrane or secreted) for biological significance and distribution of these targets in the cellular components by using the CELLO v.2.5 Subcellular Localization Predictor tool [36]. This was required to find out the surface membrane proteins which could be probable vaccine targets.

263

3. Results

270

3.1. Screening of essential genes

271

The genome of C. botulinum (Clb) type A (ATCC 19397) consists of a 3.86 Mb chromosome (Table 1) that contains 3700 genes, of which 2628 genes have been assigned to a role category. These 2628 annotated genes and their protein sequences of Clb type A were downloaded from the Pathema-JCVI database. These genes were screened for essentiality based on the concept that cellular functions encoded by essential genes [21] and for this screening, the Database of Essential Genes (DEG) was used to search homologous sequences of individual bacterial genes and proteins using the BLASTx and BLASTp options. A minimum bit score of 100 and more than 30% identity with an E-value cutoff of 10−4 was selected for short-listing the essential genes. On the basis of these parameters, 1028 genes and protein sequences were filtered and preserved. The rest of the sequences determined as non-essential were excluded from the list, except for 64 genes encoding toxins and other virulent proteins that were retained for further evaluation.

272 273

3.2. Construction of the C. botulinum interactome map

287

P

2.7. COG analysis in other bacterial species

209 210

2.9. Pathway analysis

D

214

207 208

analysis software. A full list of 471 human gastrointestinal microflora 250 is available at the Human Microbiome Project website [34]. 251

T

213

were mapped and it was found to have a relationship with C. botulinum. The human associated COG interaction network was generated and visualized using Cytoscape and finally screened out from the rest of the COGs. Bacterial protein sequences of essential and virulent proteins were also used as queries to search human-homolog and non-human homolog proteins from the human reference proteins database using the BLASTp algorithm to obtain the E-value cutoff of 1 (E = 1) and with a match of more than 50% of the query length at the National Center for Biotechnology Information (NCBI) server.

205 206

E

204

F

t3:3

255 256 257 258 259 260 261

264 265 266 267 268 269

274 275 276 277 278 279 280 281 282 283 284 285 286

To understand and analyze the topology and functional annotation 288 of protein–protein interaction (PPI) in the clostridial system, the protein 289 network was constructed (2628 proteins) using the STRING database. 290

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

312 313 314 315 316 317 318 319 320 321 322 323 324

341

To better understand the nature of these potential drug targets, a total of 1092 selected essential (1028) and virulence (64) genes of C. botulinum (Clb) were classified according to the Clusters of Orthologous Groups of Proteins (COGs) nomenclature. Using the STRING functional annotation data source, 170 candidate orthologs of best hit (N 0.5 score) were obtained. Inspection of these mapped COGs that contain one-to-one and one-to-many orthologs revealed subtle differences. An interactive network was generated to analyze the COGs and their associated partners (Figs. 3 and 4). Of the total 170 targets, 147 COGs belong to essential genes and 23 were virulent. COG includes proteins from the same species to distantly related species including H. sapiens; this reveals the high level of evolutionary conservation of protein sequences among species. The COGs were categorized according to the common macromolecular complexes and other

342

O

F

3.3. Identification of orthologous classifiers

R O

310 311

P

308 309

D

306 307

E

304 305

T

302 303

C

300 301

E

298 Q5 299

R

297

325 326

R

295 296

that damages host cells. Similarly, the ontology of gene loci (CLB_2527; GO: 0009058 & CLB_2528; GO: 0003824) is toxin production, directly interacted with the 3-oxoacyl-(Acyl-carrier-protein) synthase III, necessary for fatty acids and phospholipid metabolism. This analysis also revealed that BotR (gene locus: CLB_0846; GO: 0006355), the transcriptional regulator protein that controls expression of botulinum neurotoxin (botA) causes paralysis (gene locus: CLB_0848; GO: 0051609) and its components are associated with a probable essential gene product, CLB_3408 (Fig. 2B). CLB 3408 (EMBL-CDS: ABS33416.1) is annotated as an ABC transporter permease protein involved in transporting nutrients such as ions, peptides, sugars, and amino acids. Thus, targeting protein CLB_3408 could lead to elimination of bacteria as well as inhibition of neurotoxin production. The annotation and gene ontology (GO) of these interacting essential and virulence proteins were obtained using Uniprot [38], EMBL-EBI databases [39] and the David bioinformatics functional annotation tool [40] (Table 2).

N C O

293 294

The entire protein network consisted of high scoring interaction partners (STRING relevant confidence score ≥ 0.8). The main component of this network contained 1002 nodes and 6752 edges (nodes represent proteins and edges represent interaction). This interaction network was largely segregated into three neighborhoods: one enriched for non-essential nodes (shown in green); the second, shown in blue, is enriched for essential genes; and the third, shown in red is enriched for virulent proteins (Fig. 2A). The topological analysis of this network revealed the direct interaction of these critical essential and virulence proteins that can be probable drug targets. To view this interaction, the sub-network of essential proteins predicted above (1028 proteins) and 64 virulence proteins (total: 1092 proteins) containing 158 nodes and 267 edges was created. Among these 1092 proteins, a total of 224 proteins were mapped to the PPI network. The visual analysis of this sub-network showed that these essential proteins are interacted to each other and are involved in the biosynthesis of amino acids, signal transduction systems, the phosphotransferase system, ribosomal proteins and ABC transporters (Gene ontology of these genes was obtained using Uniprot database and David bioinformatics functional annotation tool). Of these proteins, there are certain virulent proteins that have direct interactions with essential proteins (Fig. 2B). Importantly, because these essential proteins interact with virulence factors, targeting these proteins may potentially affect the production or secretion of virulence proteins such as toxins. For example, as shown in Fig. 2B, the virulence protein hemolysin A (gene locus: CLB_1817; GO: 0008168), which is responsible for hemolysis of host erythrocytes (pathogenesis) and has methyltransferase activity, directly interacted with modifying/repairing protein, an essential protein (gene locus: CLB_3445; GO: 0008233). It is a cytotoxin that forms a membrane bound heptameric structural motif from water soluble monor on the surface of erythrocytes, penetrating the cell membranes, leading to cell lysis [37]. Therefore, targeting this interaction may not only affect the function of the essential protein but may also interfere with production or secretion of hemolysin A, a virulence factor

U

291 292

5

Fig. 2. Interactome analysis and construction of protein network. (A) Protein–protein interaction network of C. botulinum. Nodes and edges (lines) denote proteins and their interactions, respectively. Network has non-essential proteins (green nodes), essential proteins (Blue nodes) and virulence proteins (Red nodes). Network contains 1002 nodes and 6752 edges using the high-confidence data (N0.8). (B) Essential proteins (blue nodes) interact with virulence proteins (red nodes). In total, 224 proteins were mapped to the protein network. Small black arrow indicates the direct interaction of essential and virulence nodes. Red thick edges exemplified the interaction between CLB_1817 (virulence node) and CLB_3445 (essential node) as well as CLB_0846 and CLB_3408. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

327 328 329 330 331 332 333 334 335 336 337 338 339 340

343 344 345 346 347 348 349 350 351 352 353 354 355

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

D

P

R O

O

F

6

U

N

C

O

R

R

E

C

T

E

Fig. 3. Construction of ortholog network and their associated partners of Homo sapiens. (A) 80-Orthologs and their interconnected partners showing evolutionary association with Homo sapiens proteins. Network contains 3649 nodes and 10,874 edges. Nodes in network have been functionally differentiated with various colors: Dark green nodes indicates AA/Protein biosynthesis category, blue represents carbohydrate category, dark red represents DNA, light green represents ATPs, Aqua color represents molecular chaperons, dark purple represents lipids, purple represents signal transduction system, yellow represents ABC transporters and brown represents the Porphyrin metabolic functional category. (B) Classification of the functional categories and frequency of distribution of these orthologs. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4. Functional distribution of non-human homolog COGs in other bacteria. Classification and functional category distribution of non-human homolog orthologs in other bacteria. The matrix graph with grading of blue color (from dark to light) represents the frequency of COGs in that functional class. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

374 375 376 377 378 379 380 381 382

F

386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404

3.6. Comparative analysis with human microbiota

405

As most antibiotics target pathogens as well as healthy bacteria in human microbiota, leading to a prolonged impact on normal gut flora [41] therefore, in efforts to design novel drugs that specifically target only pathogenic bacteria, the short listed 494 essential and virulent

406

C

372 373

E

370 371

R

368 369

R

366 367

N C O

364 365

U

362 363

O

The COG analysis above revealed that 47% of a set of C. botulinum proteins are evolutionary related with H. sapiens proteins (Fig. 3A). Taking into account the homology between our selected proteins and human orthologs, the potential risk of side effects from targeting these bacterial components was considered. Thus, we performed further analysis to exclude proteins that had human homologs. Of the total 170 mapped COGs, 80 Clb COGs have homologous relationship with human proteomics data available in the STRING data source. These clusters have been categorized into 9-functional groups. A significant majority of the COGs (40%) contain clusters belonging to amino acid and protein biosynthesis (Fig. 3B). The concepts of comparative genomics were applied to exclude the proteins with human homologs and only keep the proteins that did not have human homologs as potential drug targets. Each essential and virulent protein sequence of the Clb was analyzed for sequence homology with the human genome using standard human BLASTp at the NCBI server with an E-value cutoff of 1 (E = 1) and a match of more than 50% of the query length. The non-homologous sequences (that do not show any similarity with human sequences) were considered as putative antibacterial drug targets. Of the total 1092 protein sequences input for the blast query, 494 were non-human homologous sequences, including 463 essential and 31 virulent proteins, while the rest of the sequences had human homologs.

384 385

R O

360 361

The distribution and occurrence in other bacterial genomes of the C. botulinum (Clb) genes that did not have human homologs were examined in order to assess the presence of the cluster of genes belonging to the same functional class. This diversity can be helpful to broaden the spectrum of antibacterial drugs. The 494 non-human homologous protein sequences (90 COGs) of Clb were mapped with the species count in the STRING functional annotation data source to study the ortholog distribution into various phylotypes of the domain bacteria. These clusters have been categorized into 12 functional groups. A significant majority of these COGs were found to be related to amino acid biosynthesis (35%), followed by ribosomal RNA function (15%) (Fig. 4). This analysis showed that Clb COGs are distributed in various bacterial genera (using the STRING data tool), mostly in Clostridia, Bacilli (Firmicutes), and in alpha and beta Proteobacteria. Of the total 90 COGs, only COG0745, a transcriptional regulator, is abundant and distributed in all studied phylotypes (from Actinobacteria to Spirochaetes). Forty-six COGs were present only in Proteobacteria (57%) and Firmicutes (43%) while the rest of the orthologs were found with various proportions (Fig. 5). Overall, the distribution indicates a broad diversity of the bacterial targets within the bacterial species.

P

3.4. Orthology analysis in H. sapiens

383

D

359

3.5. Comparative analysis with other prokaryotes

T

358

functional categories. These orthologs of Clb contain the clusters from both unicellular (other prokaryotes) and eukaryotic organisms (H. sapiens).

E

356 357

7

Fig. 5. Distribution and occurrence of orthologs in other prokaryotes. The data of Cluster heat map is showing the Clostridium botulinum COGs distribution in other bacterial Genera. The rows of this ortholog heat map represent the bacterial Genera and the columns represent the COGs. Each cell is colorized based on the value of the occurrence of the COGs in that Genus. Line graph groups these Genera into their respective Phylotypes.

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

407 408 409

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

E

C

T

E

D

P

R O

O

F

8

416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432

O

C

414 415

N

412 413

proteins of C. botulinum (Clb) were examined and compared with the gut flora by sequence analysis. For the analysis of evolutionary relationship with the gut flora, the complete genome of Clb was used as an input query against gastrointestinal flora as reference genome using the BLASTn option at the Human Microbiome Project database server. The outcomes of sequence homology analysis revealed that of the 494 input non-human homologs, 363 sequences showed similarity with the gut flora, leaving the 131 (26%) non-homolog sequences to gut flora. These 131 sequences included 108 essential and 23 virulent proteins. The complete list of these putative antibacterial targets has been provided as a supporting information file (Supplemental Table 1). The phylogenetic tree was constructed using the neighbor-joining method (Fig. 6A). This tree showed that Clb is evolutionarily related with four phylotypes: the Firmicutes, Bacteroidetes, Actinobacteria and Proteobacteria. In this tree, most of the taxa belonged to the Firmicutes. The genetic diversity of Clb in terms of phylum classification was also analyzed and compared with the already classified phylum of the gut flora in the Human Microbiome Project (HMP) database. According to this HMP classification, there are 8 phylotypes that contain 51% Firmicutes, the most lavish in the gut of the human body followed by the 19% Proteobacteria and 17% Bacteroidetes. While as compared to our analysis, it was found that the genetic diversity and similarity of Clb are restricted to 4 phylotypes and Firmicutes is the most abundant

U

410 411

R

R

Fig. 6. Phylogenetic relationship and comparative phylum distribution against the human gut flora. (A) Dendrogram showing phylogenetic and evolutionary relationships of C. botulinum type A with gut flora. The evolutionary history was inferred using the NJ method. Phylogenetic analyses were conducted in MEGA4. Most of the taxa belong to the Firmicutes followed by the Bacteroidetes. (B) Comparative analysis of Phylum diversity and Classification against Gut flora. C. botulinum is evolutionary related with 4-phylotypes when compared to Human Microbiome Project gut flora list (8-phylotypes). Lines indicate the occurrence of counted relevant phylotypes with related colors.

phyla (86%) followed by the Bacteroidetes 8% and Actinobacteria 3% 433 (Fig. 6B). 434 3.7. Identifying pathways of potential drug targets in C. botulinum

435

The role of 131 sorted genes in metabolic pathways of C. botulinum using the KEGG data source was studied, and correlated (Fig. 7). An illustrated map of the metabolic network was created showing systems-level investigations of this pathogenic microbial cell machinery. The 131 input genes encode various important biochemical reactions related to 45 pathways (KEGG database). These reactions include carbohydrate metabolism, protein synthesis and its degradation, two component systems, phosphotransferase system, motility, cofactors biosynthesis, DNA synthesis and others.

436

3.8. Prediction of sub-cellular localization

445

To further understand the nature of the candidate genes, subcellular localization analysis was performed for each of identified targets using the CELLO v.2.5 Subcellular Localization Predictor tool [36]. Among these 131 targets, 68% of the proteins were predicted to be located in the cytoplasmic compartment, out of which 6% was virulent, followed by the membrane associated proteins (22%) and extracellular proteins

446

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

437 438 439 440 441 442 443 444

447 448 449 450 451

9

E

C

T

E

D

P

R O

O

F

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

452 453

R

Fig. 7. Biochemical pathway analysis. Gene ontology (GO) network presenting the role category of each interconnected metabolic pathway encoded by the 131 genes.

457

4. Discussion

458

There is a need of finding alternative essential drug targets and therapeutic regimen to cure botulism due to the only available treatment (antitoxin) of botulism and emerging resistance to aminoglycosides, tetracycline, sulfamethoxazole–trimethoprim, and chloramphenicol and even some clostridial strains are resistant to cephalosporins and vancomycin [15–17]. Targeting an essential gene, necessary for bacterial cell survival, may provide an effective way to control infection. Thus, essential gene products of microbial cells are promising new targets for antibacterial drugs [21]. However, for some bacterial infections such as botulism where bacterial toxins play a major part in causing disease, targeting bacterial growth is not sufficient to treat the infection. The drugs are usually discouraged for botulism therapy because the ruptures of botulinum cells can worsen neurological symptomatology therefore in this study a group of essential genes was identified in C. botulinum that specifically interacts with virulence genes using computational tools, software and databases. These targets have a potential to be bactericidal while disrupting production or secretion of botulinum neurotoxins. In silico system level analysis revealed 131

461 462 463 464 465 466 467 468 469 470 471 472 473 474 475

N C O

459 460

U

454

R

455 456

(10%). Among membrane associated and extracellular essential proteins, 4% were virulent proteins (Fig. 8). Gene ontology of sub-cellular localized proteins has been discussed in Fig. 9. This membrane associated and extracellular proteins may serve as potential vaccine targets in addition to antibacterial targets (Supplemental Table 2).

proteins excluded targets homologous to human genes or to human gut flora in C. botulinum that have high potential as anti-Clb drug targets that are bactericidal, reduce bacterial toxin activity, and have minimized side effects. It is now well-established that better insights into biological systems may be obtained by considering available genomic databases and in silico systems-level studies, since biological systems are complex networks of many processes. Broader insights about the appropriateness of a potential target can be obtained by exploring pathways and whole-cellular systems relevant to that disease using comparative analysis. Computational systems-level analysis can help in assessing the criticality of the individual proteins by studying any alternate pathways and mechanisms that may naturally exist to compensate for the absence of that protein. As the necessity of systems-level studies is becoming more and more obvious, a wide spectrum of techniques have been developed and applied for the simulation and analysis of biochemical systems [42]. An entire protein–protein interaction (PPI) network of annotated genes of this bacterium was constructed to study the interactive behavior of non-essential, essential and virulent proteins using virtual tools (Fig. 2). Networks obtained by considering various protein–protein interactions and influences, on the other hand, are much more comprehensive and nearly complete in their coverage, especially because of the availability of an integrated database that considers experimentally mapped interactions and those predicted from well-established computational methods [43]. The main virulence factor of C. botulinum, the

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

virulence factors is regulated [45] and in 1998, Marvaud and coauthors has studied that botR, a transcriptional regulator for botulinum neurotoxin, interacts directly with the promoter region upstream of the ntnh–bont and ha-genes [46]. In this study, the protein network (Fig. 2B) revealed that BotR interacts with botulinum neurotoxin (gene name: botA, gene locus: CLB_0848), hemagglutinin components, (gene name: ha33, gene locus: CLB_0845), non-toxic nonhemagglutinin subunits (gene name: ntnh, gene locus: CLB_0847), as well as the ABC transporter permease protein (gene locus: CLB_3408). ABC transporters mediate the transportation of the nutrients (ions, peptides, sugars, and amino acids) to the cell and also respond to other external stimuli [47]. These proteins are important for the growth of the bacterial cell and therefore, targeting CLB_3408 (an essential protein) may also interfere in the regulation of the transcription factor (botR) and production of neurotoxin (botA). Besides essentiality to the pathogen, an ideal target should have several other properties such as non-similarity with human proteins whose inhibition could lead to potential adverse drug effects, an aspect that has been analyzed at multiple levels in this study. The simplest level, of course, was to check for sequence similarity of the target being queried with all the proteins in the human proteome at NCBI and Uniprot databases. Sequence information is readily available for hundreds of bacteria and this type of analysis has been reported for pathogenic genomes such as Helicobacter pylori [2], Pseudomonas aeruginosa [48] and even Mycobacterium tuberculosis [49]. In Fig. 3, an interactive network to analyze the Clusters of Orthologous Groups of Proteins (COGs) and their associated partners; and frequency of ortholog distribution in human has been shown. A significant majority of the COGs in human (40%) and in other prokaryotes (35%) have been found to have genes involved in amino acid biosynthesis that are common to C. botulinum. However, the heat map represented the occurrence and distribution of COGs of C. botulinum in other bacterial genera including the pathogenic species revealed the spectrum of drugs against the multiple pathogenic strains.

D

T

C E R R O C

506 507

N

504 505

neurotoxin, is produced as a non-covalently bound complex with two or more nontoxic protein components, hemagglutinin (ha), and nontoxic nonhemagglutinin (ntnh). The phenotypic and genotypic designation of the proteins and genes involved in the production of clostridial neurotoxins has been unconventional and has lacked consistency [44]. However Porfirio et al. (1997) has reported that the synthesis of many

U

502 503

E

Fig. 8. Sub-cellular localization prediction. Occurrence and distribution of proteins in sub-cellular compartments.

P

R O

O

F

10

Fig. 9. Gene ontology (GO) of sub-cellular localized proteins (bacterial targets). GO of essential and virulence sub-cellular localized proteins (cytoplasmic, extracellular, and membrane associated compartments). Bars are shown in blue as essential and red as virulence proteins. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

567 568 569 570 571 572 573 574 575 576 Q6 577 578 579 580 581 582 583

589 590 591 592 593 594 595 596 597 598

601

C

565 566

E

563 564

R

561 562

R

559 560

N C O

557 558

U

555 556

The authors declare that they have no competing interests.

We are thankful to all IUPUI System Biology and Personalized Medicine lab members for their support and encouragement while carrying out this work. We also thank Dr. Fan Zhang for his valuable inputs. We also acknowledge the Higher Education Commission (HEC) Pakistan for their support.

607

References

612

F

Competing interests

553 554

606

O

600

551 552

Acknowledgments

[1] J.N. Chan, C. Nislow, A. Emili, Recent advances and method development for drug target identification, Trends Pharmacol. Sci. 31 (2010) 82–88. [2] A. Dutta, S.K. Singh, P. Ghosh, R. Mukherjee, S. Mitter, et al., In silico identification of potential therapeutic targets in the human pathogen Helicobacter pylori, In Silico Biol. 6 (1–2) (2006) 43–47. [3] A.N. Sarangi, R. Aggarwal, Q. Rahman, N. Trivedi, Subtractive genomics approach for in silico identification and characterization of novel drug targets in Neisseria meningitides serogroup B, J. Comput. Sci. Syst. Biol. 2 (2009) 255–258. [4] T. Owa, Drug target validation and identification of secondary drug target effects using DNA microarrays, Tanpakushitsu Kakusan Koso 52 (2007) 1808–1809. [5] F. Humer, Innovation in the Pharmaceutical Industry—Future Prospects, Available: http://www.roche.com/fbh_zvg05_e.pdf 2005 (Accessed 12 May 2006). [6] K. Mdluli, M. Spigelman, Novel targets for tuberculosis drug discovery, Curr. Opin. Pharmacol. 6 (5) (2006) 459–467. [7] J. Lin, J. Qian, Systems biology approach to integrative comparative genomics, Expert Rev. Proteomics 4 (2007) 107–119. [8] M.A. Fischbach, C.T. Walsh, Antibiotics for emerging pathogens, Science 28 (2009) 1089–1093. [9] Y. Ji, The role of genomics in the discovery of novel targets for antibiotic therapy, Pharmacogenomics 3 (2002) 315–323. [10] M.J. Pucci, Use of genomics to select antibacterial targets, Biochem. Pharmacol. 71 (2006) 1066–1072. [11] K. Raman, N. Chandra, Mycobacterium tuberculosis interactome analysis unravels potential pathways to drug resistance, BMC Microbiol. 8 (2008) 234. [12] M.W. Peck, Clostridium botulinum and the safety of minimally heated, chilled foods: an emerging issue? J. Appl. Microbiol. 101 (2006) 556–570. [13] J. Sobel, Botulism, Clin. Infect. Dis. 41 (8) (2005) 1167–1173. [14] R.L. Shapiro, C. Hatheway, D.L. Swerdlow, Botulism in the United States: a clinical and epidemiologic review, Ann. Intern. Med. 129 (3) (1998) 221–228. [15] C.A. Arias, B.E. Murray, Antibiotic-resistant bugs in the 21st century—a clinical super-challenge, N. Engl. J. Med. 360 (2009) 439–443. [16] L. Boyanova, R. Kolarov, I. Mitov, Antimicrobial resistance and the management of anaerobic infections, Expert Rev. Anti-Infect. Ther. 5 (4) (2007) 685–701. [17] M. Dezfulian, V.R. Dowell Jr., Cultural and physiological characteristics and antimicrobial susceptibility of Clostridium botulinum isolates from foodborne and infant botulism cases, J. Clin. Microbiol. 11 (6) (1980) 604–609. [18] S.S. Long, L.K. Pickering, C.G. Prober, Principles and Practice of Pediatric Infectious Diseases, Churchill Livingstone Elsevier, Philadelphia Pa, 2008. (chap189). [19] P.K. Fang, B.H. Raphael, S.E. Maslanka, S. Cai, B.R. Singh, Analysis of genomic differences among Clostridium botulinum type A1 strains, BMC Genomics 11 (2010) 725. [20] L.M. Brinkac, T. Davidsen, E. Beck, A. Ganapathy, et al., Pathema: a clade-specific bioinformatics resource center for pathogen research, Nucleic Acids Res. 38 (2010) D408–D414. [21] Z. Ren, L. Yan, DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes, Nucleic Acids Res. 32 (2004) D271. [22] D. Barh, S. Tiwari, N. Jain, A. Ali, A.R. Santos, et al., In silico subtractive genomics for target identification in human bacterial pathogens, Drug Dev. Res. 72 (2011) 1–16. [23] L. Wong, G. Liu, Protein interactome analysis for countering pathogen drug resistance, J. Comput. Sci. Technol. 25 (1) (2010) 1–7. [24] D. Szklarczyk, A. Franceschini, M. Kuhn, M. Simonovic, A. Roth, et al., The STRING database in 2011: functional interaction networks of proteins globally integrated and scored, Nucleic Acids Res. 39 (2011) D561–D568. [25] M.S. Cline, M. Smoot, E. Cerami, et al., Integration of biological networks and gene expression data using Cytoscape, Nat. Protoc. 2 (10) (2007) 2366–2382. [26] E.V. Koonin, Orthologs, paralogs, and evolutionary genomics, Annu. Rev. Genet. 39 (2005) 309–338. [27] D.C. Savage, Microbial ecology of the gastrointestinal tract, Annu. Rev. Microbiol. 31 (1977) 107–133. [28] C.L. Sears, A dynamic partnership: celebrating our gut flora, Anaerobe 11 (5) (2005) 247–251. [29] F. Guarner, J.R. Malagelada, Gut flora in health and disease, Lancet 361 (9356) (2003) 512–519. [30] J. Keeley, Good bacteria trigger proteins to protect the gut, EurekAlert, Howard Hughes Medical Institute, 2004. (Accessed March 1, 2011).

R O

599

In silico comparative genomic and computational based system-level studies are highly efficient and allow restricting the potential pool of genes and useful proteins to a much smaller number, compared to the whole genome capacity of the bacteria. In the present study, 131 drug targets in C. botulinum were identified using the concept of essentiality, sequence comparative analysis and evolutionary relationships. This approach can be used in drug target identification and drug discovery for effective treatment regimen to control the botulism. The pipeline developed provides rational schema for drug targeting as well as preservative identification, which should save enormous amounts of money, resources and time in the drug discovery process. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.ygeno.2014.05.002.

549 550

P

587 588

548

D

5. Conclusion

546 547

MS, AS, and CJ conceived and developed the protocol and wrote the 603 paper. MS, YX, HH, AN, and WX performed the work. CJ and AA super- 604 vised and analyzed the overall research. 605

E

586

544 545

602

Author contributions

T

584 585 Q7

Human normal flora plays important roles in balancing the ecology of the body system [50–54]. The microflora normally live in the digestive tract and can perform a number of useful functions for their hosts. The metabolic activity performed by these bacteria is equal to that of a virtual organ, leading to gut bacteria being termed a “forgotten” organ [52]. Altering the numbers of gut bacteria, for example by taking drugs, broad-spectrum antibiotics, may affect the host's health [55]. Non-human homologous sequences cannot be considered as our confident targets because a number of microorganisms are known to inhabit the gut of a normal healthy individual to balance the ecology of the body system. Eliminating those proteins with high similarity to proteins in the gut flora also helps in ultimately reducing the risk of side effects. The short listed 494 non-human homologs of C. botulinum was compared with the gut flora by sequence analysis at the Human Microbiome Project database server and it was found that of the 494 input nonhuman homologs, 363 sequences showed similarity with the gut flora (homolog to gut flora, so cannot be putative drug targets), leaving 26% non-homolog sequences to gut flora as our confident drug targets. This similarity analysis against gut flora has also been reported for identification of drug targets against M. tuberculosis [56]. Dendrogram (Fig. 6) showed the phylogenetic and evolutionary relationships of C. botulinum with these gut flora and it was found that most of the taxa belonged to the Firmicutes followed by the Bacteroidetes. Comparative analysis of phylum diversity of C. botulinum exhibited that it is evolutionarily related with 4-phylotypes when compared to the Human Microbiome Project gut flora list (8-phylotypes). The biological pathways in which the products of these genes are involved was determined and it was found that the functional role of these putative target proteins revealed important integrated pathways involved in various structural and cellular functions essential for the survival of the C. botulinum [49]. Gene ontology (GO) of each interconnected metabolic pathway encoded by the 131 genes exhibited that most of the target proteins are cytoplasmic followed by the membrane associated. In this network it was observed that D-alanine–alanine ligase (ddl) is an essential enzyme that acts in the bacterial peptidoglycan biosynthetic pathway and is also a non-human homolog. Therefore, it is an important target for developing new antibiotics. Sharma et al. [57] also indicated that ddl may also be a good drug target in Neisseria meningitides, Haemophilus influenzae, Staphylococcus aureus, H. pylori, Burkholderia pseudomallei, Vibrio cholerae, and Acinetobacter baumannii along with Aeromonas hydrophila. Subcellular localization analysis as classifying the sites of proteins is important to understand their function, using a computational tool of these 131 bacterial targets showed that some of which could also be probable vaccine targets [36].

542 543

11

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

608 609 610 611 Q8

613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 Q9 674 675 676

F

[45] Z. Porfirio, S.M. Prado, M.D.C. Vancetto, F. Fratelli, E.W. Alves, I. Raw, B.L. Fernandes, A.C.M. Camargo, I. Lebrun, Specific peptides of casein pancreatic digestion enhance the production of tetanus toxin, J. Appl. Microbiol. 83 (1997) 678–684. [46] J.C. Marvaud, M. Gibert, K. Inoue, V. Fujinaga, K. Oguma, M.R. Popoff, botR is a positive regulator of botulinum neurotoxin and associated nontoxic protein genes in Clostridium botulinum A, Mol. Microbiol. 29 (1998) 1009–1018. [47] A.L. Davidson, E. Dassa, C. Orelle, J. Chen, Structure, function, and evolution of bacterial ATP-binding cassette systems, Microbiol. Mol. Biol. Rev. 72 (2) (2008) 317–364. [48] D. Perumal, C.S. Lim, K.R. Sakharkar, M.K. Sakharkar, Differential genome analyses of metabolic enzymes in Pseudomonas aeruginosa for drug target identification, In Silico Biol. 7 (4–5) (2007) 453–465. [49] N.K. Singh, S.M. Selvam, P. Chakravarthy, T-iDT: tool for identification of drug target in bacteria and validation by Mycobacterium tuberculosis, In Silico Biol. 6 (6) (2006) 485–493. [50] L. Beaugerie, J.C. Petit, Microbial–gut interactions in health and disease, Best Pract. Res. Clin. Gastroenterol. 18 (2) (2004) 337–352. [51] F. Guarner, J.R. Malagelada, Gut flora in health and disease, Lancet 361 (9356) (2003) 512–519. [52] A.M. O'Hara, F. Shanahan, The gut flora as a forgotten organ, EMBO Rep. 7 (7) (2006) 688–693. [53] J.P. Hugot, Inflammatory bowel disease: a complex group of genetic disorders, Best Pract. Res. Clin. Gastroenterol. 18 (3) (2004) 451–462. [54] A.G. Wynne, A.L. McCartney, J. Brostoff, B.N. Hudspith, G.R. Gibson, An in vitro assessment of the effects of broad-spectrum antibiotics on the human gut microflora and concomitant isolation of a Lactobacillus plantarum with anti-Candida activities, Anaerobe 10 (3) (2004) 165–169. [55] R.J. Carman, M.A. Simon, H. Fernández, M.A. Miller, M.J. Bartholomew, Ciprofloxacin at low levels disrupts colonization resistance of human fecal microflora growing in chemostats, Regul. Toxicol. Pharmacol. 40 (3) (2004) 319–326. [56] K. Raman, K. Yeturu, N. Chandra, targetTB: a target identification pipeline for Mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis, BMC Syst. Biol. 2 (2008) 109. [57] V. Sharma, P. Gupta, A. Dixit, In silico identification of putative drug targets from different metabolic pathways of Aeromonas hydrophila, In Silico Biol. 8 (2008) 26.

O

[31] G.D. Wu, J.D. Lewis, Analysis of the human gut microbiome and association with disease, Clin. Gastroenterol. Hepatol. 11 (7) (2013) 774–777. [32] I.V. Grigoriev, H. Nordberg, I. Shabalov, A. Aerts, et al., The genome portal of the Department of Energy Joint Genome Institute, Nucleic Acids Res. 40 (2012) D26–D32. [33] K. Tamura, J. Dudley, M. Nei, S. Kumar, MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0, Mol. Biol. Evol. 24 (2007) 1596–1599. [34] http://www.hmpdacc-resources.org/hmp_catalog/ (Accessed 5 March 2011). [35] S.F. Altschul, L.M. Thomas, A.S. Alejandro, Z. Jinghui, Z. Zheng, et al., Gapped BLAST and PSI BLAST: a new generation of protein database search programs, Nucleic Acids Res. 17 (1997) 3389–3402. [36] Z. Lu, D. Szafron, R. Greiner, P. Lu, D.S. Wishart, Predicting subcellular localization of proteins using machine-learned classifiers, Bioinformatics 20 (2004) 547–556. [37] M. Michelle, G. Eric, β-Barrel membrane protein folding and structure viewed through the lens of α-hemolysin, Biochim. Biophys. Acta 1609 (2003) 19–27. [38] T.J. Smith, K.K. Hill, B.T. Foley, J.C. Detter, A.C. Munk, D.C. Bruce, et al., Analysis of the neurotoxin complex genes in Clostridium botulinum A1–A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids, PLoS ONE 5 (2(12)) (2007) e1271. [39] Nicholas Furnham, J.S. Garavelli, J.M. Thornton, Protein data resources, Encyclopedia of Biological Chemistry, 3(2), Academic Press, 2013, pp. 611–617. [40] W. Huang da, B.T. Sherman, R.A. Lempicki, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources, Nat. Protoc. 4 (1) (2009) 44–57. [41] B.P. Willing, S.L. Russell, B.B. Finlay, Shifting the balance: antibiotic effects on host– microbiota mutualism, Nat. Rev. Microbiol. 9 (2011) 233–243. [42] W. Materi, D.S. Wishart, Computational systems biology in drug discovery and development: methods and applications, Drug Discov. Today 12 (7–8) (2007) 295–303. [43] C. von Mering, L.J. Jensen, M. Kuhn, S. Chaffron, T. Doerks, B. Krüger, B. Snel, P. Bork, STRING 7—recent developments in the integration and prediction of protein interactions, Nucleic Acids Res. 35 (2007) D358–D362. [44] M. Sebaihia, M.W. Peck, N.P. Minton, et al., Genome sequence of a proteolytic (Group I) Clostridium botulinum strain Hall A and comparative analysis of the clostridial genomes, Genome Res. 17 (2007) 1082–1092.

R O

677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710

S.A. Muhammad et al. / Genomics xxx (2014) xxx–xxx

P

12

U

N

C

O

R

R

E

C

T

E

D

745

Please cite this article as: S.A. Muhammad, et al., Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach, Genomics (2014), http://dx.doi.org/10.1016/j.ygeno.2014.05.002

711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744

Prioritizing drug targets in Clostridium botulinum with a computational systems biology approach.

A computational and in silico system level framework was developed to identify and prioritize the antibacterial drug targets in Clostridium botulinum ...
4MB Sizes 0 Downloads 3 Views