Drug Discovery Today  Volume 00, Number 00  November 2013



Teaser Drug discovery and development suffers from high cost and attrition owing to toxicity. The application of unique toxicogenomic platforms has the potential to produce safer drugs and decrease research and development costs.

Current status and future prospects of toxicogenomics in drug discovery Saifur R. Khan1, Argishti Baghdasarian1, Richard P. Fahlman2, Karim Michail1,3 and Arno G. Siraki1 1

Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Canada Department of Biochemistry, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, Canada 3 Faculty of Pharmacy, Alexandria University, Alexandria 21521, Egypt 2

In drug discovery and development (DDD), the efficacy, safety and cost of new chemical entities are the main concerns of the pharmaceutical industry. Continuously updated and stricter recommendations imposed by regulatory authorities result in greater challenges being faced by the industry. Reliable high-throughput techniques integrated with welldesigned analytical tools at all stages of DDD (termed ‘next-generation DDD’) could be a possible approach to obtaining new drug approval by cutting costs as well as ensuring the highest level of patient safety. In this review, we describe the various components of holistic toxicogenomics with examples of applications, and discuss the various analytical tools and platforms to illustrate the current status and prospects of next-generation DDD.

DDD, currently one of the most challenging and costly businesses, begins with the identification of new drug candidates either by systematic screening or serendipity, and generally ends after the compound has successfully passed clinical trials. Typically, 90% or more of the budget is spent on clinical trials, mainly in Phase III (http://www.manhattan-institute.org/html/fda_05.htm). One of the most common causes of Phase III failure is drug-induced toxicity. Additionally, drug withdrawals from the market also contribute to the escalation of costs of DDD, with subsequent drops in new lead discoveries. In a study of 548 new lead compounds approved between 1975 and 1999, 56 acquired a black box warning (the strongest warning by the FDA for a scientifically proved significant risk of serious or even life threatening adverse effects) and 16 were withdrawn [1]. A report by the North Carolina General Assembly (NCGA) (29 March, 2012 meeting; subcommittee on pharmaceuticals liability) described how the US Food and Drug Administration (FDA) had approved approximately 300 new drug applications over the past decade, of which at least 15 have since been withdrawn from the US market (http://www.ncleg.net). Surveys indicate that, in the USA, a new lead compound takes 10–15 years on average to reach the market, with an associated cost of approximately US$1.8 billion and an average success rate of only 8% [2]. Increasing the success rate of DDD and decreasing drug attrition, although challenging, could

Saifur R. Khan is pursuing his PhD at the University of Alberta, which awarded him a doctoral recruitment scholarship. He is investigating the immunemodulatory role of antituberculosis drugs by using multiple platforms, including omics. He has been awarded the Bill Bridger Award of Excellence for highest achievement among all Alberta Innovate Graduate Student Scholarship holders in 2012. He obtained his MSc in Biotechnology from Brac University (Bangladesh) in 2008, receiving the Vice-Chancellor Medal, and a B.Pharm (Hons) in 2005 from the University of Dhaka (Bangladesh). He worked in research and development at Incepta Pharmaceuticals Ltd, Bangladesh, from 2006 to 2011. Richard P. Fahlman is an associate professor in the Department of Biochemistry and an adjunct associate professor in the Department of Oncology in the Faculty of Medicine and Dentistry at the University of Alberta. In addition to Faculty appointments, he is also the associate director for the Institute for Biomolecular Design, the mass spectrometry and analytical core facility at the University of Alberta. He has a PhD in Biochemistry and Molecular Biology (2001) from Simon Fraser University. Arno G. Siraki is an assistant professor in the Faculty of Pharmacy and Pharmaceutical Sciences at the University of Alberta. His graduate studies were carried out in the Leslie Dan Faculty of Pharmacy at the University of Toronto and involved mechanistic studies of drug- and xenobiotic-catalyzed oxidative stress and the application of structure–activity relationships. His postdoctoral studies at the National Institute of Environmental Health Sciences focused on associating the formation of drug free radical metabolites with protein radicals. Siraki’s current interests are in determining the role of drug free radical metabolites in adverse drug reactions.

Corresponding author:. Siraki, A.G. ([email protected]) 1359-6446/06/$ - see front matter ß 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.drudis.2013.11.001



DRUDIS-1292; No of Pages 17 REVIEWS

Drug Discovery Today  Volume 00, Number 00  November 2013


reduce overall costs. If lead compounds can be selected based on both possible toxicity and desirable efficacy during early stages of DDD, it would provide the opportunity to not only select a few lead compounds for preclinical and clinical studies, but also reduce the chance of lead rejection owing to undesirable toxicity during clinical and post-marketing stages. The outcome would benefit all aspects of DDD and increase patient safety. However, conventional drug toxicity testing is not sufficient to predict all the possible clinical toxicities, although it does provide a vista to understand drug-induced toxicity from a mechanistic perspective. Therefore, a potential alternative (or addition) could be the involvement of toxicogenomic approaches integrated with informatics at every stage of DDD (e.g. lead selection and optimization, and preclinical and clinical studies). In this review, we present toxicogenomics as a necessary tool to be integrated into the drug toxicity research phase of DDD. This would ideally lead to the next generation of DDD (Fig. 1). Here, toxicology studies applying high-throughput techniques (omics), predictive (i.e. based on chemical structure) and translational informatics tools are collectively termed ‘toxicogenomics’, which globally encompasses the tools of transcriptomics, epigenomics, global miRNA analysis, proteomics, metabolomics and informatics

(bioinformatics and cheminformatics). Together, these approaches could provide valuable information about drug-induced toxicity, its mechanisms and potential toxicological biomarkers for DDD in a predictable and cost-effective manner (Fig. 2) [3,4]. Although the concept of toxicogenomics was originally introduced by Nuwaysir et al. in 1999 [5], it has not yet been recommended by regulatory authorities as a mandatory toxicological approach to evaluate the safety profile of new drug compounds in either preclinical or clinical studies [6]. Omics is now widely used in research areas in both academia and industry. However, the use of the integrated form or holistic approach to omics (i.e. the combination of genomics, proteomics, epigenomics, global miRNA analysis, metabolomics, etc.) is more rare. Here, we emphasize the use of holistic toxicogenomics as a means of obtaining comprehensive information, given that there is no automatic established correlation between different omics, and their results can only be integrated manually. For example, a recent estimate of the gene content of the human genome was approximately 25 000 genes, accounting for less than 5% of the total genomic DNA [7]. Genes, the blueprint for functional proteins, can be switched on or off to synthesize different proteins. The exons of pre-mRNAs can be variably retained during splicing

Drug target selection and validation Application of omics tools and informatics to determine the precise molecular mechanism of a disease. Select and validate the most important drug target(s)

Preclinical study (In vivo) Use of omics tools and bioinformatics (toxicogenomics) to evaluate the toxicity profiles of candidates (found in the vitro preclinical studies). Select the best candidates based on in vivo risk:benefit ratio. Determine the PK, PD, initial human dose and ADME for selected best candidates

Drug safety

Preclinical study (In vitro) Use of omics tools and bioinformatics (toxicogenomics) to evaluate the toxicity profiles of those best candidates (found in the computer-aided drug design phase). Select the best of the best candidates based on in vitro risk:benefit ratio

Number of candidates and cost involvement

Computer-aided drug design Select the best drug candidates for preclinical study. Cheminformatics (toxicogenomics) will help to determine the possible toxicities of drug candidates based on chemical structure

Clinical study: phase I Use of omics tools and bioinformatics (toxicogenomics) to evaluate the toxicity profiles of best candidates (found in the vivo preclinical study). Select a few best candidates for new clinical trials (Phase II and III) Drug Discovery Today


Next-generation drug discovery and development project work-flow, illustrating its impact on reducing expenditure as well as ensuring the highest level of patient safety. Abbreviations: ADME: absorption, distribution, metabolism, and excretion; PD: pharmacodynamics; PK: pharmacokinetics. 2


DRUDIS-1292; No of Pages 17 Drug Discovery Today  Volume 00, Number 00  November 2013



Xenobiotic and drug

Preclinical study (in vitro & in vivo)


Toxicogenomics tools Global gene expression

Proteins Toxicogenomics tools

Global miRNA expression


Toxicogenomics tools

Global protein expression



Clinical study (phase I, II & III)

Toxicogenomics tools

Global metabolite expression

Toxicogenomics database Final global gene expression


Drug toxicity evaluation for DDD Biomarker identification for drug toxicity Mechanism of drug toxicity

With informatics tools for data analysis A. Bioinformatics tools for: class discovery class prediction marker selection Bioinformatics mechanistic pathway analysis B. Cheminformatics tools for: chemical structure– biological response analysis Drug Discovery Today


The holistic toxicogenomics concept in drug discovery and development (DDD). The integration of information about the transcriptome, epigenomics, proteome, metabolomics and informatics (i.e. bioinformatics and cheminformatics) enables valuable information relating to drug-induced toxicities, their mechanisms and biomarkers to be revealed. Abbreviation: miRNA, microRNA.

to make different mature mRNAs. This process is known as alternative splicing and results in the production of different protein isoforms of the same gene. Mature mRNA can also be altered by miRNA. The latter is a natural silencing complex with the argonaute protein in its core that can bind with complementary sequences of target mRNA to cause degradation by transcript destabilization and/or translation repression [8,9]. Thus, the amount of protein translated cannot be predicted by the amount of RNA transcribed. In addition, post-translational protein modifications also modulate biological functions. Therefore, the integration of all aspects of global ‘omic’ approaches could represent an omnipotent tool compared with the use of individual omics approaches to investigate in-depth the mechanisms of druginduced toxicities in next-generation DDD. Another area that also affects the toxicity of drugs is drug delivery. The main focus of this area is the innovation of drug targeting strategies (e.g. active targeting, passive targeting, etc.) and the development of more effective delivery systems (e.g. gene therapy) to enhance drug effectiveness and limit adverse effects [10,11]. A key question is whether it is sufficient to use traditional toxicological approaches to evaluate sophisticated dosage forms. Novel formulations might not only be more efficacious, but might also induce toxicity by novel mechanisms [12]. It has been shown that silver, zinc oxide and aluminosilicate nanoparticles predominantly

perturbed cell cycle genes of Drosophila in DNA microarray analysis [13]. Others found evidence of in vitro genotoxicity in human alveolar epithelial A549 cells by DNA microarray analysis when a nonviral, cationic lipid–polymer carrier (oligofectamine) was used for gene therapy [14]. The diverse range of molecular pathways investigated in these studies is difficult to achieve by traditional techniques and highlights the potential of the toxicogenomic approach to lead drug discovery. In this review, we describe each of the major aspects of toxicogenomics (i.e. genomics, epigenomics, miRNA expression, proteomics and metabolomics) including highthroughput tools that could be used holistically in next-generation DDD. Furthermore, we propose that the application of holistic toxicogenomics is superior to using individual nonintegrated methods, although we recognize that there are only a few examples of holistic toxicogenomic approaches for DDD or any other similar applications. Therefore, we describe examples of individual aspects of toxicogenomics, in most cases, to provide an illustration of the applicability of holistic toxicogenomics in next-generation DDD.

Gene expression analysis The definition of disease has been redefined to include the complex interaction between biology, the environment, genes and gene mutations [15]. During a stressful condition (e.g. druginduced toxicity), the body produces a response. One form of that www.drugdiscoverytoday.com


DRUDIS-1292; No of Pages 17 REVIEWS

response can be genomic (i.e. expression of RNAs, such as mRNA, tRNA, rRNA and other RNAs). The analysis of RNA expression (also known as transcriptional profiling) became a concept of choice to investigate disease conditions following initiation of the human genome project.

Prerequisites of gene expression analysis Reviews  KEYNOTE REVIEW

Handling RNA requires specialist know-how because RNA is susceptible to degradation by RNase enzymes, which are present in cells, tissues, carried on hands, lab wares or dust. These enzymes are stable and difficult to inactivate. Special precautions are necessary to avoid the use of degraded RNA, which can result in low yields and overrepresentation of the 50 ends of the RNA molecules (https://www.roche-applied-science.com/sis/nucleic-acid-isolationpurification/index.jsp?id5NAP_050000). Moreover, determining the purity and integrity of RNA starting materials is crucial in any global gene expression analysis. There are various protocols for the extraction and purification of high-quality RNA using various kits and reagents (e.g. TriZol, Qiagen miRNeasy kit, Ambion mirVana kit, etc.). RNAs are physically classified into poly(A)[+] or poly(A)[] transcripts according to the presence or absence of a poly(A) tail at their 30 ends. It is believed that the poly(A) tail has an important role in mRNA stability, nucleocytoplasmic export and translation [16]. Classically, RNA purification involves the separation of all poly(A)[+] transcripts (thought of as coding RNA) from other noncoding RNAs by oligo(dT) cellulose/spin columns, oligo(dT)/magnetic beads or coated plates. However, there are always exceptions to the rule; for example, the poly(A)[] mature metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) is expressed at a level higher than many protein-coding genes in vivo [17]. This means that the concept of RNA purification has to be redefined and, therefore, the definition of coding RNA should be revamped. Another prerequisite of gene expression analysis is the integrity of RNA, which can be unambiguously estimated (as RNA integrity number, RIN) using microcapillary electrophoretic RNA separation, an automated high-throughput approach performed on microfabricated chips with laser-induced fluorescence detection [18]. The RIN software algorithm enables the classification of eukaryotic total RNA, based on a numbering system from 1 to 10, with 1 being the most degraded profile and 10 being the most intact. A pure and integral mRNA is crucial for unbiased results of gene expression by any of the genomic techniques.

Drug Discovery Today  Volume 00, Number 00  November 2013

[19,21]. For example, using DNA microarrays, Brown Norway rats were found to have significant changes in gene expression in spleen, liver, kidney and mesenteric lymph nodes when fed two different concentrations of hexachlorobenzene for 4 weeks. The gene expression profiles that confirmed the hexachlorobenzeneinduced toxicities, were related to the immune system, drugmetabolizing enzymes and the reproductive system [22]. This demonstrates the applicability of toxicogenomics in the preclinical (in vivo) phase of next-generation DDD. The microarray is based on hybridization between immobilized DNA on a solid matrix and mobile nucleic acid molecules, such as probe DNA, cDNA or mRNA [6]. Given that the mobile nucleic acid molecules are either fluorescently tagged or biotinylated, they emit signals after hybridization that can be easily detected by optical methods. Microarray technology also enables monitoring of the relative expression levels for almost all genes, where RNA content can vary both qualitatively and quantitatively (e.g. between tissues from drug-treated and untreated control animals). Raw microarray data are essentially image files that have to be transformed into understandable formats. To avoid systematic variation of microarray experiments and to identify the unbiased differences in two transcript levels (control and treatment), a normalization value is calculated from fluorescence ratios by: (i) global normalization using the global median of log intensity ratios; (ii) intensity-dependent linear normalization; or (iii) intensity-dependent nonlinear normalization using a LOWESS curve [23–25]. There are several commercially available microarray chips, such as Affymetrix chips and Agilent chips (Amersham, ABI), and corresponding statistical packages, such as Microarray Suite V.5 (MAS5) and Agilent Cytogenomics. Aside from commercial statistical packages, some public algorithms are also available, such as robust multi-chip average (RMA) analysis, which is popular owing to its high level of reliability [6,26]. Although microarrays are the most popular techniques for transcript profiling, they suffer from variability in genome coverage between suppliers and from compromised sensitivity owing to hybridization [27]. Microarray analysis requires cDNA of the target RNA for hybridization. cDNA synthesis introduces multiple biases and artifacts if RNA samples are short, degraded and/or small in quantity. Consequently, this leads to a biased microarray result [28]. Therefore, data validation has become an important component of DNA microarrays.

Genomics data validation Popular global genomics techniques and applications For transcriptional profiling, various techniques, such as northern blots, microarrays, serial analysis of gene expression (SAGE) and RNA-Seq are used depending on the particular study and the available technologies and expertise [19]. The northern blot is a straightforward and versatile technique to quantify mRNA levels in any type of sample, although it is associated with several limitations, including low sensitivity, long operational time and the delivery of false results if RNA in the sample is slightly degraded [20]. Some of these limitations have been circumvented in the evolutionary high-throughput microarray technique, which is capable of measuring up to 25 000 transcripts of RNA simultaneously and has the capacity to provide global information about the expression status of a significant portion of the genome 4


Real-time PCR (RT-PCR) is widely used as a data validation tool not only for DNA microarrays, but also for all types of gene expression analysis procedure. To validate gene expression data, a few highconfidence candidates (genes) are analyzed by qRT-PCR. To ensure the reliability of DNA microarray data, the FDA started an initiative in 2005 known as the MicroArray Quality Control (MAQC) project, which aims to provide necessary guidelines for using microarrays in clinical practice and regulatory decision-making. The first phase of the MAQC project aimed to provide qualitycontrol (QC) tools, such as QC metrics and thresholds to avoid procedural failures. Later, two human reference RNA samples were used to calibrate microarray and quantitative RT PCR (qRT-PCR) data. The resulting microarray data sets were validated by qRT-PCR to evaluate the nature and magnitude of systematic biases that

DRUDIS-1292; No of Pages 17

might exist between microarrays and qRT-PCR data sets. These data sets were then made publicly available to enable individual laboratories to identify and correct procedural failures more easily [29]. The important output of the MAQC project was the companion guidance regarding genomic data submission to the FDA (http://www.fda.gov/downloads/Drugs/GuidanceCompliance RegulatoryInformation/Guidances/ucm079855.pdf). During the second phase of the MAQC project (MAQC-II), scientists started to assess the capabilities and limitations of various data analysis methods to reach a consensus on the best practices for the development and validation of predictive models based on microarray gene expression and genotyping data for personalized medicine [30]. MAQC-II produced more than 30 000 models for 13 end points from six diverse clinical and preclinical data sets. Importantly, model prediction performance was largely found to be endpoint dependent. The proficiency of prediction depended on the experience level of the individual or the team. Although there was no hard-fast rule for data analysis, the frequently used starting points were: normalization, feature selection and classification, followed by internal validation and external validation. MAQC-II established good modeling practice guidelines and provided a solid foundation for the use of high-dimensional biological data in predictive and personalized medicine [31]. The third phase (MAQC-III), also known as Sequencing Quality Control (SEQC), is an ongoing project that aims to generate benchmark data sets with reference samples as well to evaluate various bioinformatic strategies (for their advantages and disadvantages) [32].

Genomics techniques for a new era Both Serial analysis of gene expression (SAGE) and massively parallel signature sequencing (MPSS) are better alternatives to microarrays in term of data quality because they are probe-selection independent. However, they are more expensive and laborious [27]. RNA-Seq, known as ‘next-generation sequencing’ (NGS) and ‘whole transcriptome shotgun sequencing’, is a recently developed high-throughput approach to transcriptome profiling that is a superior alternative to microarrays because it is prior probe-selection independent and comparatively less laborious than SAGE and MPSS. However, the cost and time requirement can be increased depending on the length of RNA [27]. There are two major steps in RNA sequencing: (i) reverse transcription of pure and integral RNA into complementary DNA; and (ii) cDNA library generation by fragmentation and sequencing. Once the cDNA is synthesized, it has to be fragmented to get the desired fragment length depending on the sequencing methods. There are several sequencing methods, such as 454 sequencing, illumine and SOLiD [32,33]. In contrast to microarrays, RNA-seq does not need reference sequences: it is highly reproducible and the dynamic range is wide (105 for RNA-seq compared with 102 for microarrays). However, determining whether the expression level of a genomic unit (such as a gene, exon or junction) differs across experimental conditions continues to be an important question in functional genomics. Therefore, the normalization in RNA-seq data is also important. Hansen et al. described a unique normalization statistical methodology known as the conditional quantile normalization algorithm, which exhibits a 42% improvement in precision without loss of accuracy. This algorithm combines robust generalized


regression to remove systematic bias introduced by deterministic features, such as guanine-cytosine (GC) content and quantile normalization to correct for global distortions [34]. Recently, Helicos BioSciences Corporation developed and commercialized the single molecule Direct RNA Sequencing (DRS) technology that is a direct RNA sequencing method without conversion to cDNA or other sample manipulations that might bias the results, such as ligation and amplification [35,36]. An in vivo study of the effects of aristolochic acid on rat kidneys was performed to compare RNASeq (Illumina Genome Analyser II) with microarray (Affymetrix Rat Genome 230 2.0 arrays) data in terms of absolute gene expression, gene expression patterns, differentially expressed genes and biological interpretation. It found that RNA-Seq was more sensitive in detecting genes with low expression levels [37]. Therefore, the sensitivity and accuracy of RNA-seq makes it the tool of choice for holistic toxicogenomics in next-generation DDD for gene expression analysis to predict drug toxicity-induced changes in the transcriptome more reliably.

Global miRNA analysis miRNAs comprise a large family of small noncoding RNAs (approximately 21–25 nts in length) that act as post-transcriptional regulators for gene expression either by repressing or silencing mRNAs [38,39]. They are important from the global gene expression perspective because they are estimated to regulate approximately 30% of all protein-coding genes [38]. A PCR array study with Sprague–Dawley rats that were treated with acetaminophen (APAP) and carbon tetrachloride (CCl4) separately identified alterations in hepatic miRNA profiles as a response to these hepatotoxicants. APAP and CCl4 both significantly increased the urinary levels of 44 and 28 miRNAs, respectively. In addition, ten of the increased miRNAs were common to both APAP and CCl4. Analysis of hepatic miRNA levels suggested that the liver was the source of the increased urinary miRNAs after APAP exposure. In addition, liver gene expression profiling by whole-genome microarrays identified eight putative miRNA target genes that were significantly altered in the liver of APAP- and CCl4-treated animals [40]. A subsequent study investigated the relation between alterations of plasma miRNA profiles in mice treated with a lethal dose of APAP and in asthma development. It used RT2 miRNA PCR arrays for mouse whole genome and found significant alterations in miRNAs, namely 574-5p, 466g, 466f-3p, 375, 29c and 148a, which had been suggested to be implicated in asthma by previous studies. Importantly, these miRNA alterations were associated with significant increases in alanine aminotransferase levels, which were lethal to 90% of the study mice within 48 hours [41]. An APAPinduced liver injury study in humans also confirmed the upregulation of miR-122 and miR-192, which had previously been reported in animal studies by Wang [42,43]. Another omics study including full-genome human miRNA expression analysis showed that a subtherapeutic APAP dose induced the same oxidative stressrelated gene responses as using a higher dose, and suggested the occurrence of possible pretoxic effects of therapeutic APAP doses [44]. The alterations in miRNA profile can lead to changes in gene expression through degradation or repression of biologically significant mRNAs, and might have implications in molecular pathways of disease development. Another in vitro study with human www.drugdiscoverytoday.com



Drug Discovery Today  Volume 00, Number 00  November 2013

DRUDIS-1292; No of Pages 17 REVIEWS

Drug Discovery Today  Volume 00, Number 00  November 2013


hepatoma cells (HepG2) treated with different concentrations of cadmium found that 960 genes were modulated at the maximum applied dose of 10 MM. Interestingly, four out of the 12-member miRNA let-7 gene family, which function as tumor suppressors, were among the identified gene pool [45]. Therefore, it is necessary to complement mRNA gene expression studies with miRNA expression analysis to understand properly the biological context of differentially expressed genes in any disease or drug-induced toxicity. The global expression profiles of miRNA can be measured either by microarray analysis or deep sequencing, whereas northern blotting, RT-PCR and in situ hybridization are usually used as QC tools. In microarray-based miRNA profiling, extracted miRNAs are labeled with fluorophore nucleotides at their 30 ends by T4 RNA ligase. Labeled miRNAs are allowed to hybridize with the miRNA probe which is fluorescently tagged or biotinylated and will emit signals after hybridization. This can be easily detected by using optical methods. Owing to their relatively smaller size, probes for miRNA profiling are usually limited in both their specificity and affinity in comparison with gene expression microarrays. As is the case with RNA transcriptional profiling, miRNA expression microarrays need validation by qRT-PCR [46]. A systematic screening of the miRNA expression profile using the miRCURYLNA miRNA microarray in both in vivo (female C57BL/6 mice) and in vitro (RAW264.7 and THP1 cell lines) revealed that glucocorticoids inhibited the expression of miR-155 in a lipopolysaccharide (LPS)-induced macrophage inflammatory responses [47]. Although the aim of this study was to investigate a novel inflammatory drug target, the use of miRNA arrays followed by data validation by RT-PCR was a good example of a miRNA array experiment design and data analysis. Data analysis of miRNA arrays is similar to other global data analyses with the exception of the particular database used. There are several miRNA databases (Table 1) that can be useful for identifying miRNAs. Deep sequencing is a next-generation sequencing platform that is a superior alternative to microarray-based global miRNA expression analysis. Its sensitivity offers an advantage over microarrays, because the wide range of miRNA expression (from tens of thousands to just a few molecules per cell) complicates the detection of miRNAs expressed at low copy numbers when using microarray analysis [46]. However, deep sequencing can produce a significant number of sequencing errors owing to the large amount of nucleotides in the genome. Single-nucleotide polymorphisms

(SNPs) are common among these errors, which can be minimized by repetition of the number of sequencing runs [48]. Nielsen et al. used deep sequencing to identify miRNA profiles and abundances in porcine skeletal muscle. Using the Kyoto Encyclopedia of Genes and Genome (KEGG) pathway, they found 212 annotated miRNAs that were associated with skeletal muscle development and regeneration, signal transduction, cell–cell and cell–extracellular matrix communication and neural development and function [49]. Therefore, the integration of mRNA array data with miRNA data is expected to provide a more complete global genomic portrait of important human diseases and/or drug toxicities than gene expression data alone in next-generation DDD.

Proteomic analysis An important question is how precise is the global genomic portrait in terms of understanding a drug-induced toxicity or a disease condition? It is well known that proteins are the ultimate mediators in all biological processes. However, there is no established correlation between genomics and proteomics (see above). Although bioinformatic tools can translate a genomic response into a proteomic one and vice versa, the result is a dry lab prediction and not an alternative of wet lab analysis. Therefore, the application of global proteomics as a part of holistic toxicogenomics in next-generation DDD has emerged as a demand of the current era. Usually, either genomic or proteomic analyses are performed, but not always concurrently owing to higher costs, unavailable expertise and/or absence of a common infrastructure for combining global techniques. Nevertheless, it is apparent that both techniques are complementary and require integration to step into the real toxicogenomic era. In other words, the paradigm of biological pathway analysis has to integrate proteomics for a complete portrait of gene expression at the protein level [50]. Verified gene expression, protein synthesis and its post-translational modification, as well as protein–protein interactions, are sequential steps of an intracellular microenvironment. Proteomic analysis provides important information about the intracellular microenvironment, including involved protein identification (proteome mapping), protein quantification (protein profiling), protein localization, 3D structure (structural proteomics), protein function (functional proteomics), post-translational modifications and protein–protein interactions [51].


Global miRNA bioinformatics tools and databases Name

Description and link

Bioinformatic tools

miRNAMap 2.0 (2008)

Contains experimentally verified miRNAs and their target genes in human, mouse, rat and other metazoan genomes (http://mirnamap.mbc.nctu.edu.tw/) A searchable database of published miRNA sequences and annotations (http:// mirbase.org/index.shtml) http://www.microrna.org/microrna/home.do Contains detailed information for each miRNA–gene interaction and the experimental validation methodologies (http://diana.cslab.ece.ntua.gr/DianaT) Contains miRNA and disease names, dysfunction evidence, the literature PudMed ID, and connected each miRNA to miRBase (Griffiths-Jones 2004) for detailed genome annotations ( Allows mapping, storage, retrieval, analysis, integration, annotation, mining and visualization of next-generation sequencing data from different technological platforms, tissues and cell lines of different organisms

RNAhybrid, Target Scan and miRanda

miRBase (2006) microRNA (2008) Tarbase 6.0 (2012) Human miRNA disease database (2008) DeepBase (2008)



microCosm, TargetScan and Pictar microCosm, TargetScan and Pictar DIANA-micoT 3.0, DIANA-mirExTra and DIANA-MirPath MISIM and TAM

dipview, lipView, nasRNA, pasRNA, rasRNA, snoSeeker, miRdeep and so on

DRUDIS-1292; No of Pages 17

Global proteomics techniques and applications Global proteomic analysis can be classified into two main categories: classic gel-based proteomics and modern shotgun proteomics. An example of classic gel-based proteomics experiments is 2D gel electrophoresis (2D-PAGE) that was first introduced by O’Farrell in 1975 and which is still the mainstay for gel-based global proteomic experiments [52]. In this technique, proteins are firstly resolved according to their charge on an IPG gel strip using isoelectric focusing (IEF). Afterwards, the resolved proteins are separated on a SDS-PAGE gel by their molecular mass [53]. The protein spots of gels are scanned to quantify their individual relative intensities using software alignment tools and proteins can then be identified using in-gel trypsin digestion by Liquid Chromatography–Mass Spectroscopy (LC–MS) and LC–MS/MS. The choice of stain type is important to identify the proteins through LC–MS. In the case of single protein identification, matrix-assisted laser desorption ionization mass spectrometry (MALDI-MS) has historically been the method of choice [54,55]. Using 2D-PAGE and MALDI-MS, 35 proteins involved in APAPinduced hepatic toxicity in mice were identified and liver proteins that are known targets for covalent modification of N-acetyl-pbenzoquinoneimine, a known toxic metabolite of APAP, were differentially modulated [56]. In another example, CFW mice were exposed to [14C]-naphthalene and lung tissues were collected for 2D-PAGE followed by MALDI-MS. As a result, 18 proteins were identified that were modified by reactive metabolites in response to naphthalene-induced lung toxicity [57]. Both examples highlight the mechanistic information revealed by global proteomics that were validated by conventional methods, and showed how indepth information can be attained for next-generation DDD by global proteomics. One of the challenges in 2D-PAGE-based proteomics is the spot-to-spot quantitative comparison between two separate gels. This limitation has been overcome by the development of 2D differential in-gel electrophoresis (DIGE) [58]. This approach involves prelabeling of the same proteins in different samples with different dyes during sample preparation, followed by the aforementioned 2D-PAGE procedure. At the end of the protein separation step, the relative ratios of identical proteins in different samples can be determined by comparing the fluorescence intensity of protein spots by fluorescence imaging [59–61]. An example of this approach was part of the European Union Innomed Predtox project and demonstrated that, in rats that were treated with troglitazone for 14 days, 82 upregulated and 22 downregulated proteins were identified by DIGE and MALDI-MS analysis. These proteins belonged to the pathways of fatty acid metabolism, PPARa/RXR activation, oxidative stress and cholesterol biosynthesis, all of which are known targets of troglitazone [62]. The sensitivity of protein detection in 2D-PAGE can also be improved by integration with western blot (WB), where the use of antibodies provides 100–10 000-fold higher sensitivity than staining with Coomassie [63]. However, this procedure cannot provide a global portrait of protein expression, although it can be useful when looking for a particular post-translational modification. In mice treated with cocaine, protein adducts in specific regions of the mouse liver were detected by using 2D-PAGE followed by competitive WB against antinorcocaine acid [64]. By contrast, in modern ‘shotgun’ proteomics, samples are mixed and digested before LC–MS/MS [65]. For complex samples,


additional separation steps might be needed, such as multidimensional chromatography or gel electrophoresis to reduce sample complexity before LC–MS/MS analysis. This analysis is robust for the qualitative data analysis, but was not initially well suited for quantitative analysis. However, in 2002, Matthias Mann and his colleagues developed the stable isotope labeling by/with amino acids in cell culture (SILAC) protocol, which is a robust quantitative shotgun proteomics method for cultured cell lines that can readily quantify the differences between samples [66]. In classic SILAC experiments, cells are metabolically labeled with either ‘light’ (1H or 12C) or ‘heavy’ (2H or 13C) amino acids. Initial experiments were published using leucine, but lysine and arginine are currently more widely utilized. Both samples are identical except that one is isotopically labeled with the ‘light’ and the other with the ‘heavy’ amino acid. Experimentally, cells with light amino acids can be treated with a control and cells grown with the heavy amino acids with drug or vice versa. Thereafter, cells are harvested and lysed. Protein lysates are mixed in equal amounts and resolved by SDS PAGE. After in-gel digestion, samples are analyzed by LC–MS/MS and the data are processed with software such as MaxQuant [67]. MaxQuant is a quantitative proteomics software package designed for analyzing large MS data sets. Maxquant analysis provides both the identification and relative abundance of the proteins between the samples as ratios of ‘light’ and ‘heavy’ peptides. An example of SILAC methodology is the investigation of the molecular mechanism of cyclosporine-induced nephrotoxicity in human embryonic kidney cells (HEK-293). From 495 quantifiable proteins, 69 were identified to be altered by cyclosporine treatment and were grouped by function [e.g. endoplasmic reticulum (ER)-stress/protein folding, metabolism/transport or cytoskeleton proteins were upregulated] [68]. Typical SILAC experiments usually compare two or three samples with one or two variables, respectively. However, more complex experiments, such as those including four variables, can be designed to compare five samples by combining two or more triple-SILAC experiments where at least one sample is in common. For triple-SILAC experiments, the labeling is light, medium and heavy, and the ratio will be determined as ‘heavy to medium’, ‘medium to light’ and ‘heavy to light’. The median peptide ratios are then taken as the protein ratio (http://www.biochem.mpg.de/mann/ SILAC/index.html). In 2011, Mann and his colleagues proposed an alternative to the classic SILAC experimental design where they used an internal standard or spike-in standard that is a metabolically labeled standard cell line. This is then mixed with in vivo derived samples that cannot be metabolically labeled. The mixing of the standard to the sample is at a 1:1 ratio and is analyzed to obtain the ratio of the relative abundance of the proteins between the internal standard and each experimental sample as the ratio of the standard sample (heavy peptides)/experiment sample (light peptides) by LC–MS/MS and MaxQuant analysis. Thereafter, recalculation of the ratio between the two ratios (e.g. two experimental samples, such as control and drug treated) against a standard provides the relative abundance of proteins between the two experiments. Notably, this ‘spike-in’ SILAC method is not restricted to any media-specific experimental design, is relatively less expensive and can be applied to any type of cell, from cultured cells to primary cell lines or tissues of various organs. More remarkably, it can also be used as an alternative to classic in vivo www.drugdiscoverytoday.com



Drug Discovery Today  Volume 00, Number 00  November 2013

DRUDIS-1292; No of Pages 17 REVIEWS


experiments [69]. In classic in vivo SILAC experiments, animals are required to be fully metabolically labeled, which is the main impediment owing to high cost. To get fully labeled animals, there is no alternative other than feeding them with isotopically labeled amino acids up to at least the second filial generation or F2 [70]. One study showed that Kindlin-3 was an essential factor for red blood cell function by using fully 13C6-substituted lysinelabeled mice that were from the F2 generation [71].

Metabolite analysis A drug can produce many metabolites (usually

Current status and future prospects of toxicogenomics in drug discovery.

In drug discovery and development (DDD), the efficacy, safety and cost of new chemical entities are the main concerns of the pharmaceutical industry. ...
859KB Sizes 0 Downloads 0 Views