Chapter 14 Computational Phosphorylation Network Reconstruction: Methods and Resources Guangyou Duan and Dirk Walther Abstract The succession of protein activation and deactivation mediated by phosphorylation and dephosphorylation events constitutes a key mechanism of molecular information transfer in cellular systems. To deduce the details of those molecular information cascades and networks has been a central goal pursued by both experimental and computational approaches. Many computational network reconstruction methods employing an array of different statistical learning methods have been developed to infer phosphorylation networks based on different types of molecular data sets such as protein sequence, protein structure, or phosphoproteomics data. In this chapter, different computational network inference methods and resources for biological network reconstruction with a particular focus on phosphorylation networks are surveyed. Key words Biological networks, Phosphorylation networks, Network inference, Protein–protein interaction, Reverse engineering

1

Introduction Biological network reconstruction by computational means (also termed reverse engineering or network inference) is a burgeoning field fueled by the broad application of holistic molecular measurement technologies—collectively called OMICS technologies—that probe the quantitative level of many, ideally all molecules of a particular molecular organizational domain. Computational biological network reconstruction methods offer an effective approach towards describing the relationships among a large numbers of biological molecules such as transcripts (mRNA), microRNA, proteins (or phosphoproteins), and metabolites. Based on the reconstructed networks, complex regulatory processes such as transcriptional regulation and signaling cascades can be better understood and modeled in silico [1]. Beyond detailed processes, overall design principals of biological networks can be revealed by applying concepts for graph-based network analysis [2]. The computational reconstruction

Waltraud X. Schulze (ed.), Plant Phosphoproteomics: Methods and Protocols, Methods in Molecular Biology, vol. 1306, DOI 10.1007/978-1-4939-2648-0_14, © Springer Science+Business Media New York 2015

177

178

Guangyou Duan and Dirk Walther

of molecular networks formed by proteins connected via phosphorylation/dephosphorylation events is the focus of this chapter. Protein posttranslational modifications (PTMs) modulate the function of the targeted protein by influencing the protein folding, stability, or interaction with other molecules [3]. Hundreds of PTMs have been characterized and can be detected experimentally on a broad scale [4]. The PTM-type phosphorylation, which attaches a phosphate group to either serine, threonine, or tyrosine residues, plays a central role in many biological processes such as cell cycle or response to extracellular stimuli. With the advancement of experimental techniques such as mass-spectrometry-(MS)-based phosphoproteomics, thousands of proteins in different species have been found to undergo phosphorylation [5, 6]. Protein phosphorylation is reversibly regulated by kinases and phosphatases that phosphorylate and dephosphorylate proteins, respectively. Kinases and phosphatases have been widely studied and hundreds to thousands of them have been detected in many different species [7, 8]. The succession of protein activation and deactivation mediated by phosphorylation and dephosphorylation events has been shown to act as a molecular information flow mechanisms among proteins in a broad range of cellular regulatory processes [9, 10]. Kinase–substrate [11] or phosphatase–substrate interactions [12] are the elementary processes in forming the molecular phosphorylation network. The phosphorylation network is a special type of protein interaction network and describes the set of proteins undergoing phosphorylation/dephosphorylation events and the interactions among them. Different experimental methods have been developed to reconstruct the phosphorylation network especially kinase– substrate relations including in vivo kinase assays and protein microarrays, see refs. 13, 14 for reviews on related experimental techniques. Because of the high time and cost requirements associated with experimental methods, computational methods have been designed to predict kinase–substrate relations based on kinase specific protein sequences and structures or other types of prior information [15, 16]. With the development of phosphoproteomics, in general, and quantitative phosphoproteomics techniques, in particular, more condition-specific and temporal phosphorylation information has become available, which can be utilized to further improve the phosphorylation network reconstruction performance [17–19]. In the following paragraphs, general computational network inference methods and different types of phosphorylation network reconstruction methods and resources will be introduced.

Computational Phosphorylation Network Reconstruction: Methods and Resources

2

179

Biological Network Inference Network models of molecular biological processes consist of nodes representing biological entities, often molecules such as genes or proteins, and edges representing the relationships among the molecules. The interactions could be direct physical interactions or abstract relationships such as involvement in same processes or correlated behavior or the likes. The need to efficiently infer molecular networks emerged with the advent of highly parallel—the measurement of many molecules at once—and high-throughput— allowing to test many different conditions—OMICs techniques. The broad application of OMICs technologies aimed at the detection of particular molecule classes resulted in large quantitative datasets of, for example, mRNAs [20], microRNAs (miRNA) [21], proteins [22] or phosphoproteins [22], and metabolites [23]. These large datasets necessitated the development and application of network inference methods to better understand the complex relationships between molecules. Apart from OMICS molecular profiling data, other types of information resources such as scientific literature [24], molecular sequence homology information [25], or structural information [26] have been utilized to infer the biological network as well. As the first broadly adopted OMICs technology with available standardized technological platforms (gene expression microarrays) and with its potential to essentially probe all transcripts of an organism, gene expression profiling has prompted the development of many computational approaches to infer gene regulatory networks [27, 28]. In general, biological network reconstruction can be approached following two different strategies: top-down or bottom-up [29]. The bottom-up approach infers biological networks based on the statistical association measures (e.g., correlationbased, regression-based), whereas the top-down approach reconstructs networks based on prior network models (e.g., Bayesian networks, differential equations). Within bottom-up approaches, correlation-based methods are used most widely as they are also intuitively best suited for the task of biological network reconstruction. The basic rationale behind these methods is the concept of guilt-by-association. Molecules that behave similarly with regard to their quantitative levels can be expected to engage in similar processes or may even be causally linked. As similarity measures (Pearson correlation, mutual information, partial correlation, Granger causality) are being used most frequently, combined with various scoring schemes to eliminate false associations (e.g., Context Likelihood of Relatedness, Maximum Relevance/Minimum Redundancy), see ref. 30 for a review about correlation-based gene regulatory network reconstruction. While Granger causality aims to infer cause-effect

180

Guangyou Duan and Dirk Walther

relationships [31], and partial correlation measures help reduce indirect correlations, Pearson correlation generally suffers from the limitation that high correlation alone is not a sufficient criterion to establish causality and wrong edges due to indirect correlations, in which two molecules appear to be correlated, when in reality, they are both linked to a third molecule influencing them both. Model-based methods, where measured data are scored against a preformed network, include Bayesian networks that combines probability and graph theories [32] or Graphical Gaussian Models [33, 34]. Other top-down approaches employ regression-based methods (e.g., review of autoregressive models in ref. [35]; Lasso in ref. [36]), differential equation-based methods [37], logic-based methods [38], and some others. Other strategies have been developed to integrate different methods such as TopkNet [39] and DELDBN [40]. Table 1 lists selected biological network reconstruction packages available for the popular statistical programming language R. They can be used directly or customized according the special requirements by the users. Considering that each type of inference method has its own strengths and weaknesses, a community-effort has been launched to objectively evaluate and compare different methods such as the Dialogue for Reverse Engineering Assessments (DREAM) [47]. In general, however, gauging the performance of network inference methods is challenging as oftentimes the true underlying networks are not known based on which computational methods could be trained and compared amongst each other. Here, artificially created networks can serve this purpose [48]. However, this approach

Table 1 List of available R packages for biological network reconstruction Name

Method

URL

minet [41]

MI

http://www.bioconductor.org/packages/release/bioc/html/ minet.html

Inferelator [42] Regression

Available upon request from authors

Cascade [43]

Regression

http://www-math.u-strasbg.fr/genpred/spip.php?article3

Gemonet

Regression + BN

http://www.stat.nthu.edu.tw/~wphsieh/causalinference.html

G1DBN [44]

DBN

http://cran.r-project.org/web/packages/G1DBN/index.html

ARTIVA [45]

DBN

http://cran.r-project.org/web/packages/ARTIVA/index.html

TSCGM [46]

Graphical http://www.math.rug.nl/stat/uploads/Main/CGsparse_1.0.zip Gaussian Model (GGM) + DBN

DBN dynamic Bayesian network, GGM graphical Gaussian model, BN Bayesian network, MI mutual information

Computational Phosphorylation Network Reconstruction: Methods and Resources

181

comes with the risk that the particular method of network creation used in the simulation may unduly favor particular inference methods.

3

Phosphorylation Network Inference Proteomics—the family of OMICS technologies to probe protein levels—and phosphoproteomics—the specific detection of phosphorylated proteins—has seen rapid progress in recent years [49]. Applied technologies range from mass spectroscopy (MS) [50] to protein chips [51]. Specialized databases have been established to capture the generated information (see Table 2 for a list of publicly available phosphorylation site databases by collecting the related experimentally verified phosphorylation site, some of them also provide the corresponding kinase information). Although many biological network inference methods have been developed already, their application to phosphorylation network reconstruction remains challenging because of the inherent limitations of phosphoproteomics datasets such as limited sample numbers or very few time points as well as limited coverage of phosphoproteins [19]. In this chapter, computational network reconstruction methods aimed specifically at the inference of phosphorylation networks

Table 2 List of selected available protein phosphorylation site information databases Name

Species

URL

PTM type

Evidence

UniProt [52]

All

http://www.uniprot.org

All

Experimental/ prediction

PhosphoSitePlus [53] Human/ mouse

http://www. phosphosite.org/

Phosphorylation, ubiquitination, acetylation, methylation

Experimental

Phospho.ELM [5]

Eukaryotes

http://phospho.elm.eu. org/

Phosphorylation

Experimental

Phospho3D [54]

Eukaryotes

http://www.phospho3d. org/

Phosphorylation

Experimental

PhosphoGRID [55]

Saccharomyces http://www. cerevisiae phosphogrid.org/

Phosphorylation

Experimental

PlantsP [56]

Arabidopsis thaliana

http://plantsp.genomics. Phosphorylation purdue.edu/

Experimental

PhosPhAt [6]

Arabidopsis thaliana

http://phosphat. uni-hohenheim.de/

Experimental/ prediction

Phosphorylation

182

Guangyou Duan and Dirk Walther

are considered. For a general overview of selected signaling network modeling and data analysis studies of phosphoproteomics data, see ref. 57. In the supplementary data of [58], the authors provided a comprehensive list of phosphorylation-related tools including the prediction of phosphorylation motif and phosphorylation networks. In the following, phosphorylation network inference methods are categorized according to their using of quantitative proteomics data or not [59]. 3.1 Non-quantitative Data Network Inference Methods

Protein phosphorylation sites are targeted by corresponding cognate kinases [60]. It has been shown that phosphorylation sites and their flanking amino acid sequences are conserved within gene families and across species [61]. With the advancement of mass spectrometry, the phosphoproteome can be measured with high coverage in one experiment setting [62]. These phosphoproteomic data provide rich, but static information about phosphorylation events. Based on the measured phosphoproteomics data and prior kinase–substrate information, many studies focused on the prediction of kinase–substrate relations using protein sequence information as the primary input [63–65].

3.1.1 Kinase-Specific Phosphorylation Sites Prediction

With the accumulation of experimentally verified phosphorylation sites, an array of different phosphorylation site prediction methods have been developed to provide a low-cost and fast alternative to performing experiments. Some general phosphorylation site prediction methods were developed based on protein sequence or structure information without considering the kinase specificity, such as NETPHOS [66], DIPHOS [67], and some others [68–70]. The main limitation of general phosphorylation sites prediction methods is that they cannot provide information on the actual kinase–substrate relationships. To overcome this shortcoming, several kinase-specific phosphorylation sites prediction methods have been developed. Most of the available methods are based on phosphopeptide sequence conservation and different classification methods (e.g., Position-Specific Scoring Matrix, Hidden Markov Model, Support Vector Machine), such as ScanSite [71, 72], PPSP [73], PHOSITE [74], rBBFNN [68], PredPhospho [75], NETPHOSK [76, 77], GPS [78–80], KinasePhos [81, 82], AutoMotif [83], GANNPhos [84], IEPP [85], SMALI [86], SiteSeek [87], Musite [88], PAAS [89], BAE [90], PostMod [91], PKIS [92] , pkaPA [93], and others [94]. Beyond sequence information, protein 3D-structural information has also been exploited for the prediction of kinase-specific phosphorylation site, such as PREDKIN [73, 95, 96] and Phos3D [97]. In those studies, the amino acids in the spatial vicinity close to the phosphorylation site has been found to also be conserved for different kinase families. However, as phosphorylation sites have been observed to frequently reside in flexible protein regions [67],

Computational Phosphorylation Network Reconstruction: Methods and Resources

183

the only modest (albeit it is significant) prediction improvement gained by including 3D-information can be understood. Despite best efforts, false positive predictions of phosphorylation site cannot be avoided if based solely on sequence or structural information. In order to mitigate this source of wrongly inferred network connections, methods have been developed that integrate additional, and ideally, orthogonal information types to filter out potential false positive predictions such as information on protein– protein interactions or protein subcellular localization: NetworKIN [98, 99], iGPS [100], and others [101, 102]. NetworKIN has been applied in the context of JNK phosphorylation networks [103], human embryonic stem cells (ESC) differentiation [104], autophagy-related phosphorylation networks [105], and mouse skin carcinogenesis [106]. PhosNetConstruct was designed recently to decipher domain-based kinase–substrate relationships by implicitly incorporating various contextual factors such as the embedding of phosphopeptides in interaction domains [107]. As for the detailed comparison of different predictors, see refs. 15, 16, 108–110. Based on the already developed computational phosphorylation site predictors, meta-predictor strategies that combine them have also been implemented [111]. Kinase-specific phosphorylation site predictions are even more challenging when phosphorylation sites are targeted by more than one kinase and possibly in a condition specific manner. Sequenceonly-based phosphorylation sites predictors cannot determine, which kinases phosphorylate which site in a condition-specific manner [112]. Here, the availability of quantitative data may help to address this difficulty. 3.1.2 Protein–Protein Interactions (PPI)/Pathway Mapping

Phosphorylation plays an important role in the modulation of protein–protein interactions [113]. Several database resources on protein–protein interactions exist (Table 3) and most of them can be conveniently queried jointly as they have been integrated in the common query system PSIQUIC [123]. They also provide valuable protein interaction context for phosphoproteins. As reported in [19, 124], phosphoproteins in yeast and Arabidopsis have significantly higher interaction degree than average proteins, which implies that they play a central role in processes involving protein–protein interactions. Information on PPIs has also been used to filter out false positive kinase–substrate relationships [98]. The inclusion of PPI-information in the context of protein phosphorylation has been applied to the study of DNA damage response (DDR) [125], the signaling downstream of EGFRvIII (a truncated extracellular mutant of the EGF receptor) under four glioblastoma multiforme cell lines [126], the study of the insulin signaling pathway [127], bidirectional signaling in cell–cell communication in EphB2

184

Guangyou Duan and Dirk Walther

Table 3 List of selected protein–protein interaction databases Name

Interaction type

URL

STRING [114]

Physical/functional

http://string.embl.de/

BioGRID [115]

Physical/genetic

http://thebiogrid.org/

DIP [116]

Experimental

http://dip.doe-mbi.ucla.edu/dip/

HitPredict [117]

Integration

http://hintdb.hgc.jp/htp/

MINT [118]

Experimental

http://mint.bio.uniroma2.it/mint/

IntAct [119]

Experimental

http://www.ebi.ac.uk/intact/

APID [120]

Experimental

http://bioinfow.dep.usal.es/apid/index.htm

MIPS [121]

Experimental

http://mips.helmholtz-muenchen.de/proj/ppi/

HPRDa [122]

Experimental

http://www.humanproteinpedia.org/

a

Homo sapiens only

ephrin-B1 expressing cells [128], different murine tissues [129], human MDA-MB-231 tumor xenografts (CD44-expressing tumors) response to antibody therapy [130], SKBR3 breast cancer cells treated with Lapatinib inhibitor [131], and others. 3.2 Inference Methods Exploiting Quantitative Proteomics Data

Network reconstruction methods based on non-quantitative data introduced above cannot capture the dynamics of phosphorylation events. With the advancement of quantitative phosphoproteomics [132, 133], especially temporally quantitative phosphoproteomics [59, 134, 135], phosphorylation networks can now be reconstructed with higher accuracy and specificity, and temporal resolution. Data normalization is an essential step in network inference based on quantitative data, especially for quantitative phosphoproteomics data from mass-spectroscopic experiments. Phosphorylation levels measured under specific conditions or time points can only be interpreted as relative values compared to the corresponding control samples [136]. Another need for proper data preprocessing is imposed by the set of parametric statistical inference methods that are to be applied to the data analysis as they typically require the data to be distributed normally. Commonly used data preprocessing and normalization steps include logarithmic transformation, normalization by subtracting the median or mean, and data standardization enforcing the data to have mean zero and variance of one [19]. These steps can easily be implemented using basic functions in R or by using the “normalize” function from the R package “som” (http://cran.r-project.org/web/packages/som/index.html). It has to be noted that, even though phosphorylation is a binary

Computational Phosphorylation Network Reconstruction: Methods and Resources

185

process, the actual data are continuous as the measurement signal is the average over many proteins and phosphopeptides with varying phosphorylation status. Generally, normalization also needs to consider the details of the experiment design [18] to eliminate as much as possible any systematic confounding experimental factors. Although attempts were made to develop and apply complex inference methods using quantitative phosphoproteomics data, considering the current limitations of the employed technologies (e.g., limited sample numbers), simple and linear correlation-based network reconstruction methods have been used most frequently for the purpose of large-scale phosphorylation network reconstruction as they proved to be robust despite data sparsity. Pearson correlation was applied to the reconstruction of the phosphorylation network of the epidermal growth factor (EGF) stimulated HeLa cells [18] and nutrient-induced phosphorylation networks in Arabidopsis [19]. If sufficient temporal information from performed time series experiments is available, the concept of correlation can be broadened to allow time-shifted correlations such that phosphorylation events that are occurring sequentially can be discovered. Similarly, other dynamic modeling techniques such as dynamic Bayesian networks or Granger causality can be applied. However, the application of these concepts is still hampered by the low number of time points measured typically in current experimental series. Although there are a few case studies to infer large-scale phosphorylation networks ab initio as mentioned above, model-based methods such as regression-based approaches have been applied to study small-scale phosphorylation networks, see ref. 137. Specifically, multiple linear regression (MLR) was used to study the phosphorylation networks of primary hepatocytes and hepatocellular carcinoma cell lines, and partial least square regression (PLSR) was applied to study mammalian cell apoptotic response processes [138, 139]. Other applications of model-based approaches include Bayesian networks used to infer phosphorylation networks of the downstream signaling of CD3, CD28, and LFA-1 activation in human primary CD4+ T cells [17] and the signaling downstream of EGF stimulation [140]. See ref. 141 which summarizes several issues of the application of Bayesian networks to model phosphoproteomic data. Ordinary differential equations (ODE) were also employed to study ErbB-activated pathways [142]. Logic modeling including Boolean logic or constrained fuzzy logic modeling was applied to study signaling networks in hepatocellular carcinoma cells [143, 144]. For the integration of different types of prior information and knowledge in the network reconstruction process, rule-based methods have been used [145, 146]. Modular response analysis that aims to detect network topology responses after different stimulations was used to infer MAPK

186

Guangyou Duan and Dirk Walther

networks in response to NGF or EGF stimulation [147]. Multipleinputs-multiple-outputs (MIMO) models that predict the quantitative outcomes of combinatorial perturbations were applied to the study of EGFR/MAPK and PI3K/AKT pathways in a breast cancer cell line [148]. As yet another information-based method, maximum entropy was used to reconstruct a tyrosine signaling pathway [149]. 3.3 Integration Method

4

With the increasing accumulation of prior biological knowledge, integration strategies that combine different kinds of data sources have been developed. For example, the prediction software NetworKIN filters out false-positive kinase–target relations by combining protein–protein interaction or protein subcellular localization information [98]. As one widely used integration strategy, Bayesian statistics frameworks were designed to infer phosphorylation networks in stimulated cells by integrating different types of data in addition to temporal quantitative phosphoproteomic data, such as amino acid sequence data, genomic context data (e.g., gene fusion, gene neighborhood, phylogenetic profiles), primary experimental evidence (e.g., physical protein interaction, gene co-expression), known pathway databases, and literature mining [112]. Recently, PhosphoChain has been developed to reconstruct kinase and phosphatase networks by integrating mRNA expression levels of regulators and targets with motif detection algorithms and optional prior information, where mRNA expression levels served as proxies for protein activities [58]. Other experimental perturbation methods like genetic ablation can also be used to assist in the inference of phosphorylation networks [150].

Summary and Future Challenges Protein phosphorylation events play an important role in cellular signaling, and therefore, the reconstruction of phosphorylation networks bears great potential to investigate the corresponding signal transduction events. As reviewed in this chapter, many different computational methods have been developed to infer phosphorylation networks based on different statistical learning methods using different types of data, such as sequence, structure, or (quantitative) phosphoproteomics data. The reconstruction of the phosphorylation network constitutes an important step for the subsequent network modeling approaches such as exploring more detailed kinetic information for the corresponding phosphorylation signaling cascades. As phosphorylation processes were observed to be cellular compartment-dependent [150], more refined methods need to be

Computational Phosphorylation Network Reconstruction: Methods and Resources

187

developed. Except subcellular location, cell-type specific signaling networks is another important direction for future research [151, 152], especially given the advancement of single-cell experimental techniques. The application of quantitative data in the reconstruction of phosphorylation network is still seldomly applied because of the limitations of the currently available phosphoproteomics techniques. In particular, current methods cannot comprehensively cover all phosphoproteins in a biological sample. And of those proteins that have been detected, their renewed detection in another experiment is not guaranteed either. Hence, current inference methods are severely limited as frequently, the inferred relationships may be indirect and mediated by one of several intervening, but undetected proteins. Therefore, the utility of network inference methods will to great extent depend also on the improvement of experimental phosphoproteomics techniques, especially with regard to temporal resolution, coverage, and reproducibility. References 1. Barabási A-L, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5:101–113. doi:10.1038/nrg1272 2. Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31:64–68. doi:10.1038/ng881 3. Jensen ON (2006) Interpreting the protein language using proteomics. Nat Rev Mol Cell Biol 7:391–403. doi:10.1038/nrm1939 4. The UniProt Consortium (2013) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res 42:D191–D198. doi:10.1093/nar/gkt1140 5. Dinkel H, Chica C, Via A, Gould CM, Jensen LJ, Gibson TJ, Diella F (2011) Phospho. ELM: a database of phosphorylation sites – update 2011. Nucleic Acids Res 39:D261– D267. doi:10.1093/nar/gkq1104 6. Zulawski M, Braginets R, Schulze WX (2013) PhosPhAt goes kinases – searchable protein kinase target information in the plant phosphorylation site database PhosPhAt. Nucleic Acids Res 41:D1176–D1184. doi:10.1093/ nar/gks1081 7. Hunter T, Plowman GD (1997) The protein kinases of budding yeast: six score and more. Trends Biochem Sci 22:18–22 8. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S (2002) The protein kinase complement of the human genome. Science 298:1912–1934. doi:10.1126/science. 1075762

9. McCubrey JA, May WS, Duronio V, Mufson A (2000) Serine/threonine phosphorylation in cytokine signal transduction. Leukemia 14:9–21. doi:10.1038/sj.leu.2401657 10. Davis RJ (2000) Signal transduction by the JNK group of MAP kinases. In: Letts LG, Morgan DW (eds) Inflammatory process and inflammatory research. Birkhäuser Basel, Basel, pp 13–21 11. Lee T-Y, Bo-Kai Hsu J, Chang W-C, Huang H-D (2010) RegPhos: a system to explore the protein kinase-substrate phosphorylation network in humans. Nucleic Acids Res 39:D777–D787. doi:10.1093/nar/gkq970 12. Li X, Wilmanns M, Thornton J, Kohn M (2013) Elucidating human phosphatasesubstrate networks. Sci Signal 6:rs10. doi:10.1126/scisignal.2003203 13. Ptacek J, Snyder M (2006) Charging it up: global analysis of protein phosphorylation. Trends Genet 22:545–554. doi:10.1016/j. tig.2006.08.005 14. Mok J, Zhu X, Snyder M (2011) Dissecting phosphorylation networks: lessons learned from yeast. Expert Rev Proteomics 8:775– 786. doi:10.1586/epr.11.64 15. Xue Y, Gao X, Cao J, Liu Z, Jin C, Wen L, Yao X, Ren J (2010) A summary of computational resources for protein phosphorylation. Curr Protein Pept Sci 11:485–496 16. Trost B, Kusalik A (2011) Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 27:2927–2935. doi:10.1093/ bioinformatics/btr525

188

Guangyou Duan and Dirk Walther

17. Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP (2005) Causal proteinsignaling networks derived from multiparameter single-cell data. Science 308:523–529. doi:10.1126/science.1105809 18. Imamura H, Yachie N, Saito R, Ishihama Y, Tomita M (2010) Towards the systematic discovery of signal transduction networks using phosphorylation dynamics data. BMC Bioinformatics 11:232. doi:10.1186/ 1471-2105-11-232 19. Duan G, Walther D, Schulze W (2013) Reconstruction and analysis of nutrientinduced phosphorylation networks in Arabidopsis thaliana. Front Plant Sci 4:540. doi:10.3389/fpls.2013.00540 20. Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470. doi:10.1126/science.270.5235.467 21. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA, Downing JR, Jacks T, Horvitz HR, Golub TR (2005) MicroRNA expression profiles classify human cancers. Nature 435:834–838. doi:10.1038/ nature03702 22. Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422:198–207. doi:10.1038/nature01511 23. Fiehn O (2002) Metabolomics – the link between genotypes and phenotypes. Plant Mol Biol 48:155–171 24. Tudor CO, Arighi CN, Wang Q, Wu CH, Vijay-Shanker K (2012) The eFIP system for text mining of protein interaction networks of phosphorylated proteins. Database 2012:bas044. doi:10.1093/database/ bas044 25. Ferraro E, Via A, Ausiello G, HelmerCitterich M (2005) A neural strategy for the inference of SH3 domain-peptide interaction specificity. BMC Bioinformatics 6:S13. doi:10.1186/1471-2105-6-S4-S13 26. Finn RD, Miller BL, Clements J, Bateman A (2014) iPfam: a database of protein family and domain interactions found in the Protein Data Bank. Nucleic Acids Res 42:D364– D373. doi:10.1093/nar/gkt1210 27. Werhli AV, Grzegorczyk M, Husmeier D (2006) Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and Bayesian networks. Bioinformatics 22:2523– 2531. doi:10.1093/bioinformatics/btl391 28. Maetschke SR, Madhamshettiwar PB, Davis MJ, Ragan MA (2014) Supervised, semisupervised and unsupervised inference of

29.

30.

31.

32.

33.

34.

35.

36. 37.

38.

39.

40.

gene regulatory networks. Brief Bioinform 15:195–211. doi:10.1093/bib/bbt034 National Research Council (2007) Nutrigenomics and beyond: informing the future - workshop summary. Washington, DC: The National Academies Press Hempel S, Koseska A, Nikoloski Z, Kurths J (2011) Unraveling gene regulatory networks from time-resolved gene expression data – a measures comparison study. BMC Bioinformatics 12:292. doi:10.1186/14712105-12-292 Walther D, Strassburg K, Durek P, Kopka J (2010) Metabolic pathway relationships revealed by an integrative analysis of the transcriptional and metabolic temperature stressresponse dynamics in yeast. OMICS 14:261–274. doi:10.1089/omi.2010.0010 Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7:601–620. doi:10.1089/106652700750050961 Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4:32 Schäfer J, Strimmer K (2005) An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21:754– 764. doi:10.1093/bioinformatics/bti062 Michailidis G, D’Alché-Buc F (2013) Autoregressive models for gene regulatory network inference: sparsity, stability and causality issues. Math Biosci 246:326–334. doi:10.1016/j.mbs.2013.10.003 Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc B 58:267–288 Zhang Y, Pu Y, Zhang H, Su Y, Zhang L, Zhou J (2013) Using gene expression programming to infer gene regulatory networks from timeseries data. Comput Biol Chem 47:198–206. doi:10.1016/j.compbiolchem.2013.09.004 Brouard C, Vrain C, Dubois J, Castel D, Debily M-A, d’Alché-Buc F (2013) Learning a Markov Logic network for supervised gene regulatory network inference. BMC Bioinformatics 14:273. doi:10.1186/1471-2105-14-273 Hase T, Ghosh S, Yamanaka R, Kitano H (2013) Harnessing diversity towards the reconstructing of large scale gene regulatory networks. PLoS Comput Biol 9:e1003361. doi:10.1371/journal.pcbi.1003361 Li Z, Li P, Krishnan A, Liu J (2011) Largescale dynamic gene regulatory network inference combining differential equation models with local dynamic Bayesian network analysis. Bioinformatics 27:2686–2691. doi:10.1093/ bioinformatics/btr454

Computational Phosphorylation Network Reconstruction: Methods and Resources 41. Meyer PE, Lafitte F, Bontempi G (2008) minet: a R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics 9:461. doi:10.1186/1471-2105-9-461 42. Bonneau R, Reiss DJ, Shannon P, Facciotti M, Hood L, Baliga NS, Thorsson V (2006) The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol 7:R36. doi:10.1186/gb-2006-7-5-r36 43. Vallat L, Kemper CA, Jung N, MaumyBertrand M, Bertrand F, Meyer N, Pocheville A, Fisher JW, Gribben JG, Bahram S (2013) Reverse-engineering the genetic circuitry of a cancer cell with predicted intervention in chronic lymphocytic leukemia. Proc Natl Acad Sci 110:459–464. doi:10.1073/ pnas.1211130110 44. Lèbre S (2009) Inferring dynamic genetic networks with low order independencies. Stat Appl Genet Mol Biol 8:9. doi:10.2202/ 1544-6115.1294 45. Lèbre S, Becq J, Devaux F, Stumpf MPH, Lelandais G (2010) Statistical inference of the time-varying structure of gene-regulation networks. BMC Syst Biol 4:130. doi:10.1186/1752-0509-4-130 46. Abegaz F, Wit E (2013) Sparse time series chain graphical models for reconstructing genetic networks. Biostatistics 14:586–599. doi:10.1093/biostatistics/kxt005 47. Stolovitzky G, Monroe D, Califano A (2007) Dialogue on reverse-engineering assessment and methods: the DREAM of highthroughput pathway inference. Ann N Y Acad Sci 1115:1–22. doi:10.1196/annals.1407.021 48. Schaffter T, Marbach D, Floreano D (2011) GeneNetWeaver: In silico benchmark generation and performance profiling of network inference methods. Bioinformatics 27:2263– 2270. doi:10.1093/bioinformatics/btr373 49. Mumby M, Brekken D (2005) Phosphoproteomics: new insights into cellular signaling. Genome Biol 6:230. doi:10.1186/gb-2005-6-9-230 50. Collins MO, Yu L, Choudhary JS (2007) Analysis of protein phosphorylation on a proteome-scale. Proteomics 7:2751–2768. doi:10.1002/pmic.200700145 51. Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, Fasolo J, Guo H, Jona G, Breitkreutz A, Sopko R, McCartney RR, Schmidt MC, Rachidi N, Lee S-J, Mah AS, Meng L, Stark MJR, Stern DF, De Virgilio C, Tyers M, Andrews B, Gerstein M, Schweitzer B, Predki PF, Snyder M (2005) Global analysis of protein phosphorylation in yeast. Nature 438:679–684. doi:10.1038/nature04187

189

52. The UniProt Consortium (2014) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res 42:7486. doi:10.1093/ nar/gku469 53. Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M (2012) PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 40:D261–D270. doi:10.1093/nar/gkr1122 54. Zanzoni A, Carbajo D, Diella F, Gherardini PF, Tramontano A, Helmer-Citterich M, Via A (2011) Phospho3D 2.0: an enhanced database of three-dimensional structures of phosphorylation sites. Nucleic Acids Res 39:D268–D271. doi:10.1093/nar/gkq936 55. Sadowski I, Breitkreutz B-J, Stark C, Su T-C, Dahabieh M, Raithatha S, Bernhard W, Oughtred R, Dolinski K, Barreto K, Tyers M (2013) The phosphoGRID Saccharomyces cerevisiae protein phosphorylation site database: version 2.0 update. Database 2013:bat026. doi:10.1093/database/bat026 56. Tchieu JH (2003) The PlantsP and PlantsT functional genomics databases. Nucleic Acids Res 31:342–344. doi:10.1093/nar/gkg025 57. Terfve C, Saez-Rodriguez J (2012) Modeling signaling networks using high-throughput phospho-proteomics. Adv Exp Med Biol 736:19–57. doi:10.1007/978-1-4419-7210-1_2 58. Chen W-M, Danziger SA, Chiang J-H, Aitchison JD (2013) PhosphoChain: a novel algorithm to predict kinase and phosphatase networks from high-throughput expression data. Bioinformatics 29:2435–2444. doi:10.1093/bioinformatics/btt387 59. Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M (2006) Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127:635– 648. doi:10.1016/j.cell.2006.09.026 60. Nilsson CL (2012) Advances in quantitative phosphoproteomics. Anal Chem 84:735– 746. doi:10.1021/ac202877y 61. Maathuis FJ (2008) Conservation of protein phosphorylation sites within gene families and across species. Plant Signal Behav 3:1011–1013 62. Peters EC, Brock A, Ficarro SB (2004) Exploring the phosphoproteome with mass spectrometry. Mini Rev Med Chem 4:313–324 63. Puente LG, Lee REC, Megeney LA (2009) Reconstructing regulatory kinase pathways from phosphopeptide data: a bioinformatics approach. Methods Mol Biol 527:311–319. doi:10.1007/978-1-60327-834-8_23 64. Tan CSH, Linding R (2009) Experimental and computational tools useful for (re)construction

190

65.

66.

67.

68.

69.

70.

71.

72.

73.

74.

75.

76.

Guangyou Duan and Dirk Walther of dynamic kinase-substrate networks. Proteomics 9:5233–5242. doi:10.1002/ pmic.200900266 Ren J, Gao X, Liu Z, Cao J, Ma Q, Xue Y (2011) Computational analysis of phosphoproteomics: progresses and perspectives. Curr Protein Pept Sci 12:591–601 Blom N, Gammeltoft S, Brunak S (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294:1351–1362. doi:10.1006/ jmbi.1999.3310 Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32:1037–1049. doi:10.1093/nar/gkh253 Berry EA, Dalby AR, Yang ZR (2004) Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms. Comput Biol Chem 28:75–85 Plewczyński D, Tkacz A, Godzik A, Rychlewski L (2005) A support vector machine approach to the identification of phosphorylation sites. Cell Mol Biol Lett 10:73–89 Swaminathan K, Adamczak R, Porollo A, Meller J (2010) Enhanced prediction of conformational flexibility and phosphorylation in proteins. Adv Exp Med Biol 680:307–319. doi:10.1007/978-1-4419-5913-3_35 Yaffe MB, Leparc GG, Lai J, Obata T, Volinia S, Cantley LC (2001) A motif-based profile scanning approach for genome-wide prediction of signaling pathways. Nat Biotechnol 19:348–353. doi:10.1038/86737 Obenauer JC, Cantley LC, Yaffe MB (2003) Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31:3635–3641. doi:10.1093/nar/gkg584 Brinkworth RI, Breinl RA, Kobe B (2003) Structural basis and prediction of substrate specificity in protein serine/threonine kinases. Proc Natl Acad Sci U S A 100:74–79. doi:10.1073/pnas.0134224100 Koenig M, Grabe N (2004) Highly specific prediction of phosphorylation sites in proteins. Bioinformatics 20:3620–3627. doi:10.1093/bioinformatics/bth455 Kim JH, Lee J, Oh B, Kimm K, Koh I (2004) Prediction of phosphorylation sites using SVMs. Bioinformatics 20:3179–3184. doi:10.1093/bioinformatics/bth382 Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S (2004) Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid

77.

78.

79.

80.

81.

82.

83.

84.

85.

86.

sequence. Proteomics 4:1633–1649. doi:10.1002/pmic.200300771 Hjerrild M, Stensballe A, Rasmussen TE, Kofoed CB, Blom N, Sicheritz-Ponten T, Larsen MR, Brunak S, Jensen ON, Gammeltoft S (2004) Identification of phosphorylation sites in protein kinase A substrates using artificial neural networks and mass spectrometry. J Proteome Res 3:426–433 Zhou F-F, Xue Y, Chen G-L, Yao X (2004) GPS: a novel group-based phosphorylation predicting and scoring method. Biochem Biophys Res Commun 325:1443–1448. doi:10.1016/j.bbrc.2004.11.001 Xue Y, Zhou F, Zhu M, Ahmed K, Chen G, Yao X (2005) GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res 33:W184–W187. doi:10.1093/nar/gki393 Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X (2008) GPS 2.0, a tool to predict kinasespecific phosphorylation sites in hierarchy. Mol Cell Proteomics 7:1598–1608. doi:10.1074/mcp. M700574-MCP200 Huang H-D, Lee T-Y, Tzeng S-W, Horng J-T (2005) KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res 33:W226–W229. doi:10.1093/nar/gki471 Wong Y-H, Lee T-Y, Liang H-K, Huang C-M, Wang T-Y, Yang Y-H, Chu C-H, Huang H-D, Ko M-T, Hwang J-K (2007) KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns. Nucleic Acids Res 35:W588–W594. doi:10.1093/ nar/gkm322 Plewczynski D, Tkacz A, Wyrwicz LS, Rychlewski L (2005) AutoMotif server: prediction of single residue post-translational modifications in proteins. Bioinformatics 21:2525–2527. doi:10.1093/bioinformatics/bti333 Tang Y-R, Chen Y-Z, Canchaya CA, Zhang Z (2007) GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. Protein Eng Des Sel 20:405–412. doi:10.1093/protein/gzm035 Wang M, Li C, Chen W, Wang C (2008) Prediction of PK-specific phosphorylation site based on information entropy. Sci China C Life Sci 51:12–20. doi:10.1007/ s11427-008-0012-1 Li L, Wu C, Huang H, Zhang K, Gan J, Li SS-C (2008) Prediction of phosphotyrosine signaling networks using a scoring matrixassisted ligand identification approach. Nucleic Acids Res 36:3263–3273. doi:10.1093/nar/ gkn161

Computational Phosphorylation Network Reconstruction: Methods and Resources 87. Yoo PD, Ho YS, Zhou BB, Zomaya AY (2008) SiteSeek: post-translational modification analysis using adaptive locality-effective kernel methods and new profiles. BMC Bioinformatics 9:272. doi:10.1186/1471-2105-9-272 88. Gao J, Thelen JJ, Dunker AK, Xu D (2010) Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Mol Cell Proteomics 9:2586–2600. doi:10.1074/ mcp.M110.001388 89. Sobolev B, Filimonov D, Lagunin A, Zakharov A, Koborova O, Kel A, Poroikov V (2010) Functional classification of proteins based on projection of amino acid sequences: application for prediction of protein kinase substrates. BMC Bioinformatics 11:313. doi:10.1186/1471-2105-11-313 90. Yu Z, Deng Z, Wong H-S, Tan L (2010) Identifying protein-kinase-specific phosphorylation sites based on the Bagging-AdaBoost ensemble approach. IEEE Trans Nanobioscience 9:132–143. doi:10.1109/ TNB.2010.2043682 91. Jung I, Matsuyama A, Yoshida M, Kim D (2010) PostMod: sequence based prediction of kinase-specific phosphorylation sites with indirect relationship. BMC Bioinformatics 11:S10. doi:10.1186/1471-2105-11-S1-S10 92. Zou L, Wang M, Shen Y, Liao J, Li A, Wang M (2013) PKIS: computational identification of protein Kinases for experimentally discovered protein Phosphorylation sites. BMC Bioinformatics 14:247. doi:10.1186/14712105-14-247 93. Neuberger G, Schneider G, Eisenhaber F (2007) pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model. Biol Direct 2:1. doi:10.1186/1745-6150-2-1 94. Dang TH, Van Leemput K, Verschoren A, Laukens K (2008) Prediction of kinasespecific phosphorylation sites using conditional random fields. Bioinformatics 24:2857–2864. doi:10.1093/bioinformatics/btn546 95. Saunders NFW, Kobe B (2008) The Predikin webserver: improved prediction of protein kinase peptide specificity using structural information. Nucleic Acids Res 36:W286– W290. doi:10.1093/nar/gkn279 96. Saunders NFW, Brinkworth RI, Huber T, Kemp BE, Kobe B (2008) Predikin and PredikinDB: a computational framework for the prediction of protein kinase peptide specificity and an associated database of phosphorylation sites. BMC Bioinformatics 9:245. doi:10.1186/1471-2105-9-245 97. Durek P, Schudoma C, Weckwerth W, Selbig J, Walther D (2009) Detection and character-

98.

99.

100.

101.

102.

103.

104.

105.

106.

191

ization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins. BMC Bioinformatics 10:117. doi:10.1186/1471-2105-10-117 Linding R, Jensen LJ, Ostheimer GJ, van Vugt MA, Jørgensen C, Miron IM, Diella F, Colwill K, Taylor L, Elder K, Metalnikov P, Nguyen V, Pasculescu A, Jin J, Park JG, Samson LD, Woodgett JR, Russell RB, Bork P, Yaffe MB, Pawson T (2007) Systematic discovery of in vivo phosphorylation networks. Cell 129:1415–1426. doi:10.1016/j. cell.2007.05.052 Linding R, Jensen LJ, Pasculescu A, Olhovsky M, Colwill K, Bork P, Yaffe MB, Pawson T (2008) NetworKIN: a resource for exploring cellular phosphorylation networks. Nucleic Acids Res 36:D695–D699. doi:10.1093/ nar/gkm902 Song C, Ye M, Liu Z, Cheng H, Jiang X, Han G, Songyang Z, Tan Y, Wang H, Ren J, Xue Y, Zou H (2012) Systematic analysis of protein phosphorylation networks from phosphoproteomic data. Mol Cell Proteomics 11:1070– 1083. doi:10.1074/mcp.M111.012625 Liu Z, Ren J, Cao J, He J, Yao X, Jin C, Xue Y (2013) Systematic analysis of the Plkmediated phosphoregulation in eukaryotes. Brief Bioinform 14:344–360 Bensimon A, Schmidt A, Ziv Y, Elkon R, Wang S-Y, Chen DJ, Aebersold R, Shiloh Y (2010) ATM-dependent and -independent dynamics of the nuclear phosphoproteome after DNA damage. Sci Signal 3:3. doi:10.1126/scisignal.2001034 Bakal C, Linding R, Llense F, Heffern E, Martin-Blanco E, Pawson T, Perrimon N (2008) Phosphorylation networks regulating JNK activity in diverse genetic backgrounds. Science 322:453–456. doi:10.1126/science. 1158739 Van Hoof D, Muñoz J, Braam SR, Pinkse MWH, Linding R, Heck AJR, Mummery CL, Krijgsveld J (2009) Phosphorylation dynamics during early differentiation of human embryonic stem cells. Cell Stem Cell 5:214– 226. doi:10.1016/j.stem.2009.05.021 Bennetzen MV, Mariño G, Pultz D, Morselli E, Færgeman NJ, Kroemer G, Andersen JS (2012) Phosphoproteomic analysis of cells treated with longevity-related autophagy inducers. Cell Cycle 11:1827–1840 Zanivan S, Meves A, Behrendt K, Schoof EM, Neilson LJ, Cox J, Tang HR, Kalna G, van Ree JH, van Deursen JM, Trempus CS, Machesky LM, Linding R, Wickström SA, Fässler R, Mann M (2013) In vivo SILACbased proteomics reveals phosphoproteome

192

107.

108.

109.

110.

111.

112.

113.

114.

115.

116.

Guangyou Duan and Dirk Walther changes during mouse skin carcinogenesis. Cell Rep 3:552–566. doi:10.1016/j. celrep.2013.01.003 Damle NP, Mohanty D (2014) Deciphering kinase-substrate relationships by analysis of domain-specific phosphorylation network. Bioinformatics 30:1730–1738. doi:10.1093/ bioinformatics/btu112 Kobe B, Kampmann T, Forwood JK, Listwan P, Brinkworth RI (2005) Substrate specificity of protein kinases and computational prediction of substrates. Biochim Biophys Acta 1754:200–209. doi:10.1016/j.bbapap. 2005.07.036 Hjerrild M, Gammeltoft S (2006) Phosphoproteomics toolbox: computational biology, protein chemistry and mass spectrometry. FEBS Lett 580:4764–4770. doi:10.1016/j.febslet.2006.07.068 Miller ML, Blom N (2009) Kinase-specific prediction of protein phosphorylation sites. Methods Mol Biol 527:299–310. doi:10.1007/978-1-60327-834-8_22 Wan J, Kang S, Tang C, Yan J, Ren Y, Liu J, Gao X, Banerjee A, Ellis LBM, Li T (2008) Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection. Nucleic Acids Res 36:e22. doi:10.1093/nar/gkm848 Santra T, Kholodenko B, Kolch W (2012) An integrated Bayesian framework for identifying phosphorylation networks in stimulated cells. Adv Exp Med Biol 736:59–80. doi:10.1007/978-1-4419-7210-1_3 Seet BT, Dikic I, Zhou M-M, Pawson T (2006) Reading protein modifications with interaction domains. Nat Rev Mol Cell Biol 7:473–483. doi:10.1038/nrm1960 Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41:D808– D815. doi:10.1093/nar/gks1094 Chatr-Aryamontri A, Breitkreutz B-J, Heinicke S, Boucher L, Winter A, Stark C, Nixon J, Ramage L, Kolas N, O’Donnell L, Reguly T, Breitkreutz A, Sellam A, Chen D, Chang C, Rust J, Livstone M, Oughtred R, Dolinski K, Tyers M (2013) The BioGRID interaction database: 2013 update. Nucleic Acids Res 41:D816–D823. doi:10.1093/ nar/gks1158 Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res 32:D449–D451. doi:10.1093/ nar/gkh086

117. Patil A, Nakai K, Nakamura H (2011) HitPredict: a database of quality assessed protein-protein interactions in nine species. Nucleic Acids Res 39:D744–D749 118. Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, Sacco F, Palma A, Nardozza AP, Santonico E, Castagnoli L, Cesareni G (2012) MINT, the molecular interaction database: 2012 update. Nucleic Acids Res 40:D857–D861. doi:10.1093/ nar/gkr930 119. Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, Jandrasits C, Jimenez RC, Khadake J, Mahadevan U, Masson P, Pedruzzi I, Pfeiffenberger E, Porras P, Raghunath A, Roechert B, Orchard S, Hermjakob H (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40:D841–D846. doi:10.1093/nar/gkr1088 120. Prieto C, De Las Rivas J (2006) APID: agile protein interaction data analyzer. Nucleic Acids Res 34:W298–W302. doi:10.1093/ nar/gkl128 121. Mewes HW, Ruepp A, Theis F, Rattei T, Walter M, Frishman D, Suhre K, Spannagl M, Mayer KFX, Stümpflen V, Antonov A (2010) MIPS: curated databases and comprehensive secondary data resources in 2010. Nucleic Acids Res 1–5. doi:10.1093/nar/gkq1157 122. Prasad TSK, Kandasamy K, Pandey A (2009) Human protein reference database and human proteinpedia as discovery tools for systems biology. Methods Mol Biol 577:67–79. doi:10.1007/978-1-60761-232-2 123. Del-Toro N, Dumousseau M, Orchard S, Jimenez RC, Galeota E, Launay G, Goll J, Breuer K, Ono K, Salwinski L, Hermjakob H (2013) A new reference implementation of the PSICQUIC web service. Nucleic Acids Res 41:W601–W606. doi:10.1093/nar/ gkt392 124. Yachie N, Saito R, Sugiyama N, Tomita M, Ishihama Y (2011) Integrative features of the yeast phosphoproteome and protein– protein interaction map. PLoS Comput Biol 7:e1001064. doi:10.1371/journal. pcbi.1001064 125. Matsuoka S, Ballif BA, Smogorzewska A, McDonald ER, Hurov KE, Luo J, Bakalarski CE, Zhao Z, Solimini N, Lerenthal Y, Shiloh Y, Gygi SP, Elledge SJ (2007) ATM and ATR substrate analysis reveals extensive protein networks responsive to DNA damage. Science 316:1160–1166. doi:10.1126/ science.1140321 126. Huang PH, Mukasa A, Bonavia R, Flynn RA, Brewer ZE, Cavenee WK, Furnari FB,

Computational Phosphorylation Network Reconstruction: Methods and Resources

127.

128.

129.

130.

131.

132.

133.

134.

135.

White FM (2007) Quantitative analysis of EGFRvIII cellular signaling networks reveals a combinatorial therapeutic strategy for glioblastoma. Proc Natl Acad Sci U S A 104:12867–12872. doi:10.1073/ pnas.0705158104 Krüger M, Kratchmarova I, Blagoev B, Tseng Y-H, Kahn CR, Mann M (2008) Dissection of the insulin signaling pathway via quantitative phosphoproteomics. Proc Natl Acad Sci U S A 105:2451–2456. doi:10.1073/ pnas.0711713105 Jørgensen C, Sherman A, Chen GI, Pasculescu A, Poliakov A, Hsiung M, Larsen B, Wilkinson DG, Linding R, Pawson T (2009) Cellspecific information processing in segregating populations of Eph receptor ephrin-expressing cells. Science 326:1502–1509. doi:10.1126/ science.1176615 Huttlin EL, Jedrychowski MP, Elias JE, Goswami T, Rad R, Beausoleil SA, Villén J, Haas W, Sowa ME, Gygi SP (2010) A tissuespecific atlas of mouse protein phosphorylation and expression. Cell 143:1174–1189. doi:10.1016/j.cell.2010.12.001 Weigand S, Herting F, Maisel D, Nopora A, Voss E, Schaab C, Klammer M, Tebbe A (2012) Global quantitative phosphoproteome analysis of human tumor xenografts treated with a CD44 antagonist. Cancer Res 72:4329–4339. doi:10.1158/0008-5472. CAN-12-0136 Imami K, Sugiyama N, Imamura H, Wakabayashi M, Tomita M, Taniguchi M, Ueno T, Toi M, Ishihama Y (2012) Temporal profiling of lapatinib-suppressed phosphorylation signals in EGFR/HER2 pathways. Mol Cell Proteomics 11:1741–1757. doi:10.1074/mcp.M112.019919 White FM (2008) Quantitative phosphoproteomic analysis of signaling network dynamics. Curr Opin Biotechnol 19:404–409. doi:10.1016/j.copbio.2008.06.006 Morandell S, Grosstessner-Hain K, Roitinger E, Hudecz O, Lindhorst T, Teis D, Wrulich OA, Mazanek M, Taus T, Ueberall F, Mechtler K, Huber LA (2010) QIKS – Quantitative identification of kinase substrates. Proteomics 10:2015–2025. doi:10.1002/pmic.200900749 Niittylä T, Fuglsang AT, Palmgren MG, Frommer WB, Schulze WX (2007) Temporal analysis of sucrose-induced phosphorylation changes in plasma membrane proteins of Arabidopsis. Mol Cell Proteomics 6: 1711–1726. doi:10.1074/mcp. M700164MCP200 Engelsberger WR, Schulze WX (2012) Nitrate and ammonium lead to distinct global dynamic phosphorylation patterns when

136.

137.

138.

139.

140.

141.

142.

143.

144.

145.

193

resupplied to nitrogen starved Arabidopsis seedlings. Plant J 69:978–995. doi:10.1111/j.1365-313X.2011.04848.x Nita-Lazar A, Saito-Benz H, White FM (2008) Quantitative phosphoproteomics by mass spectrometry: past, present, and future. Proteomics 8:4433–4443. doi:10.1002/ pmic.200800231 Ekins S, Xu JJ, Alexopoulos LG, SaezRodriguez J, Espelin CW (2008) Highthroughput protein-based technologies and computational models for drug development, efficacy, and toxicity. In: Ekins S, Xu JJ (eds) Drug efficacy, safety, and biologics discovery: emerging technologies and tools. Wiley, Hoboken, NJ, pp 29–52 Gaudet S, Janes KA, Albeck JG, Pace EA, Lauffenburger DA, Sorger PK (2005) A compendium of signals and responses triggered by prodeath and prosurvival cytokines. Mol Cell Proteomics 4:1569–1590. doi:10.1074/ mcp. M500158-MCP200 Janes KA, Albeck JG, Gaudet S, Sorger PK, Lauffenburger DA, Yaffe MB (2005) A systems model of signaling identifies a molecular basis set for cytokine-induced apoptosis. Science 310:1646–1653. doi:10.1126/ science.1116598 Ciaccio MF, Wagner JP, Chuu C-P, Lauffenburger DA, Jones RB (2010) Systems analysis of EGF receptor signaling dynamics with microwestern arrays. Nat Methods 7:148–155. doi:10.1038/nmeth.1418 Wagner JP, Lauffenburger DA (2009) Bayesian network inference of phosphoproteomic signaling networks. In: Seventh Annual Workshop on Bayes Applications, Montreal, Canada Chen WW, Schoeberl B, Jasper PJ, Niepel M, Nielsen UB, Lauffenburger DA, Sorger PK (2009) Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol Syst Biol 5:239. doi:10.1038/msb.2008.74 Saez-Rodriguez J, Alexopoulos LG, Epperlein J, Samaga R, Lauffenburger DA, Klamt S, Sorger PK (2009) Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction. Mol Syst Biol 5:331. doi:10.1038/ msb.2009.87 Morris MK, Saez-Rodriguez J, Sorger PK, Lauffenburger DA (2010) Logic-based models for the analysis of cell signaling networks. Biochemistry 49:3216–3224. doi:10.1021/ bi902202q Hlavacek WS, Faeder JR, Blinov ML, Posner RG, Hucka M, Fontana W (2006) Rules for modeling signal-transduction systems. Sci STKE 2006:re6. doi:10.1126/stke.3442006re6

194

Guangyou Duan and Dirk Walther

146. Danos V, Feret J, Fontana W, Harmer R, Krivine J, Biosystems P, Supérieure ÉN, Polytechnique É (2007) Rule-based modelling of cellular signalling. Lect Notes Comput Sci 4703:17–41 147. Santos SDM, Verveer PJ, Bastiaens PIH (2007) Growth factor-induced MAPK network topology shapes Erk response determining PC-12 cell fate. Nat Cell Biol 9:324–330. doi:10.1038/ncb1543 148. Nelander S, Wang W, Nilsson B, She Q-B, Pratilas C, Rosen N, Gennemark P, Sander C (2008) Models from experiments: combinatorial drug perturbations of cancer cells. Mol Syst Biol 4:216. doi:10.1038/msb.2008.53 149. Locasale JW, Wolf-Yadlin A (2009) Maximum entropy reconstructions of dynamic signaling networks from quantitative proteomics data.

PLoS One 4:e6522. doi:10.1371/journal. pone.0006522 150. Wu XN, Sanchez-Rodriguez C, PertlObermeyer H, Obermeyer G, Schulze WX (2013) Sucrose-induced receptor kinase SIRK1 regulates plasma membrane aquaporins in Arabidopsis. Mol Cell Proteomics 12:2856– 2873. doi:10.1074/mcp.M113.029579 151. Chen X, Shi S-P, Suo S-B, Xu H-D, Qiu J-D (2014) Proteomic analysis and prediction of human phosphorylation sites in subcellular level reveals subcellular specificity. Bioinformatics. doi:10.1093/bioinformatics/btu598 152. Van Wijk KJ, Friso G, Walther D, Schulze WX (2014) Meta-analysis of Arabidopsis thaliana phospho-proteomics data reveals compartmentalization of phosphorylation motifs. Plant Cell 26:2367–2389. doi:10.1105/tpc.114.125815

Computational phosphorylation network reconstruction: methods and resources.

The succession of protein activation and deactivation mediated by phosphorylation and dephosphorylation events constitutes a key mechanism of molecula...
430KB Sizes 1 Downloads 10 Views