581487

research-article2015

JLAXXX10.1177/2211068215581487Journal of Laboratory AutomationFundingJia et al.

Original Report

Prediction of Protein–Protein Interactions with Physicochemical Descriptors and Wavelet Transform via Random Forests

Journal of Laboratory Automation 1­–10 © 2015 Society for Laboratory Automation and Screening DOI: 10.1177/2211068215581487 jala.sagepub.com

Jianhua Jia1,2, Xuan Xiao1,3, and Bingxiang Liu1

Abstract Protein–protein interactions (PPIs) provide valuable insight into the inner workings of cells, and it is significant to study the network of PPIs. It is vitally important to develop an automated method as a high-throughput tool to timely predict PPIs. Based on the physicochemical descriptors, a protein was converted into several digital signals, and then wavelet transform was used to analyze them. With such a formulation frame to represent the samples of protein sequences, the random forests algorithm was adopted to conduct prediction. The results on a large-scale independent-test data set show that the proposed model can achieve a good performance with an accuracy value of about 0.86 and a geometric mean value of about 0.85. Therefore, it can be a usefully supplementary tool for PPI prediction. The predictor used in this article is freely available at http://www.jci-bioinfo.cn/PPI_RF. Keywords physicochemical descriptor, wavelet transform, random forest, protein–protein interaction

Introduction Proteins play a vital role in nearly all biological functions, such as composing the cellular structure and promoting chemical reactions. Many critical functions and processes in biology are sustained largely by different types of protein–protein interactions (PPIs), and PPIs are highly relevant to disease states. During the past few years, a vast amount of protein data has received a significant improvement with the rapid development of biotechnology. In recent years, various experimental techniques have been developed for large-scale PPI analysis, such as yeast twohybrid systems,1,2 mass spectrometry,3,4 protein chips,5 and so on. But these experimental approaches are tedious, timeconsuming, labor-intensive, and expensive.6 Only a small part of the PPIs’ pairs is analyzed by such methods.6 Hence, it is important to develop a reliable computational model to relieve the difficulty of the identification of PPIs. In biology, it is virtually axiomatic that the sequence specifies conformation, which implies an intriguing hypothesis: The amino acid sequence alone might be sufficient to determine the conformation of the protein.7 Therefore, only the sequence information may be used to predict the interactions between two proteins via machine learning methods. Until now, a number of computational models have been proposed for predicting PPIs using simple protein sequence information alone,8–13 and some impressive performances have been reported. Some methods based on the

genomic information, such as phylogenetic profiles,14,15 the gene neighborhood,16 and gene fusion events,17,18 have been developed for prediction of PPIs by accounting for the pattern of the presence or absence of a given gene in a set of genomes. But they can be applied only to completely sequenced genomes and cannot be used for the essential protein that is common to most organisms. Sequence conservation 19,20 between interacting proteins also has been reported. Martin et al.21 and Chou22 et al. have developed computational methods for PPIs identification only via the sequence information and have had a prediction accuracy of 80%. Shen et al.9 have developed an improved model that reaches a higher prediction accuracy of 83.5% when applied to human PPI identification. All of these methods account for the properties of one amino acid and its proximate two amino acids via a conjoint triad method. In fact, the PPIs may occur in the discontinuous amino acids segments in the 1

School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China 2 School of Computer Science, University of Birmingham, Edgbaston, Birmingham, UK 3 Gordon Life Science Institute, Boston, MA, USA Received Oct. 24, 2014. Corresponding Author: Jianhua Jia, School of Information Engineering, Jingdezhen Ceramic Institute, 333403, Jingdezhen, China. Email: [email protected]

Downloaded from jla.sagepub.com at East Carolina University on April 22, 2015

2

Journal of Laboratory Automation 

sequence, and the prediction ability of these sequencebased methods may be beneficial from the consideration of these interactions. Furthermore, the prediction models used in these methods were developed via limited training samples (often, ) in the first column, followed by lines of sequence data. The words right after the > symbol in the single initial line are optional and only used for the purposes of identification and description. Step 3: To get the predicted result, you only need to click on the Submit button. For example, if you use the query amino acids sequences in the Example window as input, you will see the status of your job on your screen. When the job is done, the results will be displayed on the page. As regards the computational time, the work will be accomplished within 15s in most cases. However, the length of the sequence is the key crucible for time consumption; the longer the query protein sequence is, the more time is usually needed.

Downloaded from jla.sagepub.com at East Carolina University on April 22, 2015

6

Journal of Laboratory Automation 

Figure 3.  Screenshot of the PPI Predictor Web Server.

Step 4: As shown in the lower panel of Figure 3, you may also choose batch prediction by entering your e-mail address and your desired batch input file (in FASTA format) via the Browse button. To see a sample of a batch input file, click on the Batch-Example button. Step 5: By clicking the Citation button, you will find the relevant papers that document the detailed development of the predictor. Step 6: Click on the Supporting Information button to download the benchmark data set used to train and test the PPI predictor.

Results and Discussion Effect of Wavelet Functions In this section, the influence of the wavelet function was analyzed because a suitable wavelet basis can match the underlying structure of the signal better, and better features can be extracted from the original protein sequences. Some properties of the wavelet basis, such as compact support, orthogonality, symmetry, smoothness, and a high order of vanishing moments, must be considered for signal processing. It is hoped that the wavelet functions would own these mentioned properties. However, there are many conflicting conditions that restrict the selection of them. None of the wavelet basis functions possesses all of these desirable properties simultaneously. In recent decades, Daubechies constructed a class of orthonormal wavelet basis functions with compact support and smooth properties. In this study, five Daubechies23 were tested: Daubechies of number 1 (Db1), number 2 (Db2), number 3 (Db3), number 4 (Db4), and number 5 (Db5). As seen in Table 3, the training accuracy reached 0.8542 when

using the random forests algorithm as the classifier with the DB1 wavelet function used to extract the features. The number of trees in random forests is 200, and the number of mtrys (dimension of subspace) is 45. However, when other wavelet functions are used, the training accuracies range from 0.8454 to 0.8866. Moreover, other performance measures, such as geometric means, sensitivity, specificity, F-measure, and MCC, were also investigated. Table 3 shows that DB1 was also the best one. These results may be caused by a property that the DB1 wavelet possesses: a lower vanish moment. More non-zero coefficients are generated after the decomposition, and the diverse trees used in random forests are easy obtained because the diversity of component learners is necessary.40 But for a single learner, a higher-vanish-moment wavelet function such as DB4 is needed. In this study, the DB1 wavelet function was selected as the appropriate wavelet function in our experiments.

Performance on the S. cerevisiae Data Set The proposed predictor was first applied to the S. cerevisiae data set. The data set consisted of 17,505 positive pairs and 27,204 negative pairs; 5943 positive pairs and 5943 negative pairs were randomly selected from S. cerevisiae as the training data set. The remained ones were used as the independent-test data set. A 5-fold cross-validation was used to evaluate the predictor on the training data set, and the procedure was repeated 10 times. The results from the training data set are shown in Table 4. From the results shown in Table 4, we can see that the proposed model achieves a good performance on the training data set. The average results of the model are 0.8395 for accuracy, 0.6785 for the MCC, 0.8379 for the geometric

Downloaded from jla.sagepub.com at East Carolina University on April 22, 2015

7

Jia et al. Table 3.  Performance of the Different Wavelet Functions by 10-Cross-Validation. Evaluating Methods Wavelet Functions DB1 DB2 DB3 DB4 DB5

Accuracy

F-Measure

G-Mean

Sensitivity

Specificity

MCC

0.8866 0.8729 0.8625 0.8557 0.8454

0.8817 0.8779 0.8758 0.8456 0.8515

0.8865 0.8719 0.8574 0.8540 0.8455

0.8786 0.8526 0.8393 0.8647 0.8600

0.8940 0.8963 0.8943 0.8481 0.8298

0.7728 0.7469 0.7260 0.7108 0.6904

Table 4.  5-fold Cross-Validation Results of the Training Data on the S. cerevisiae Data Set. Evaluation Methods    1  2  3  4  5  6  7  8  9 10 mean Guo

Accuracy

Sensitivity

Specificity

F-Measure

MCC

G-Mean

0.8376±0.0035 0.8384±0.0069 0.8349±0.0048 0.8429±0.0058 0.8417±0.0030 0.8336±0.0031 0.8359±0.0083 0.8412±0.0064 0.8382±0.0100 0.8401±0.0055 0.8395±0.0029 0.7796±0.0031

0.8607±0.0114 0.8597±0.0097 0.8538±0.0134 0.8656±0.0099 0.8677±0.0110 0.8596±0.0069 0.8610±0.0138 0.8622±0.0116 0.8617±0.0073 0.8635±0.0138 0.8615±0.0036 0.7684±0.0031

0.8174±0.0098 0.8195±0.0097 0.8182±0.0088 0.8228±0.0148 0.8191±0.0072 0.8111±0.0065 0.8141±0.0171 0.8231±0.0144 0.8181±0.0195 0.8195±0.0070 0.8183±0.0034 0.7822±0.0043

0.8322±0.0044 0.8334±0.0080 0.8304±0.0055 0.8379±0.0057 0.8358±0.0059 0.8273±0.0046 0.8300±0.0080 0.8366±0.0068 0.8328±0.0110 0.8348±0.0055 0.8331±0.0031 0.7864±0.0035

0.6768±0.0072 0.6780±0.0137 0.6710±0.0101 0.6873±0.0109 0.6852±0.0068 0.6690±0.0061 0.6736±0.0160 0.6840±0.0123 0.6782±0.0184 0.6816±0.0116 0.6785±0.0058 0.5099±0.0062

0.8371±0.0036 0.8379±0.0071 0.8345±0.0049 0.8425±0.0056 0.8410±0.0037 0.8328±0.0034 0.8353±0.0081 0.8407±0.0065 0.8376±0.0101 0.8395±0.0053 0.8379±0.0029 0.7791±0.0031

mean, 0.8331 for the F-measure, 0.8615 for sensitivity, and 0.8183 for specificity. After the 5-fold cross-validation, the independent-test data set, the remaining pairs of the data set, was also applied to further evaluate the proposed predictors. In the test data, 11,562 positive samples and 21,261 negative samples are included. The experimental results are shown in Table 5. From the results shown in Table 5, we can see that the proposed model achieves a good performance on the testing data set. The average results of the predictor on the test data set are 0.8566 for accuracy, 0.6891 for the MCC, 0.8471 for the geometric mean, 0.8006 for the F-measure, 0.7846 for sensitivity, and 0.8984 for specificity.

Compared with Other Methods In addition, we compared the effectiveness of our proposed model with the method proposed by Guo.10 The model also used only the sequence features to predict the PPIs. The auto-covariance (AC) features and support vector machine (SVM) are used for prediction. AC accounts for the interactions between residues that are a certain distance apart in the sequence, so this model mainly takes the neighboring effect into account. The definition of AC is as follows:

AClag , j =



1 n − lag

n −lag



( X i, j −

i =1

 1  X ( i + lag ), j −  n 

1 n

n

∑X

i, j ) ×

i =1

 X i, j   i =1 

(10)

n





where j represents one descriptor such as the physiochemical descriptors of the amino acids, which composed the protein; i is the position in sequence X; n is the length of sequence X; and lag is the value of the lag (the maximum distance between an amino acid residue and its neighbor with a certain number of residues away). In this work, a protein pair is converted into a 420-dimensional (2 × 30 × 7) vector by AC with a lag of 30 amino acids, where 2 is the number of two protein sequences and 7 is the number of descriptors. The experimental results on training data and testing data are shown in Figure 4. From Figure 4, we can see that the performance of the proposed model is better than that of the AC model, especially regarding the MCC value. This means that the proposed model has higher accuracy with both negative and positive samples, and possesses better prediction ability. Furthermore, we must point out that the number of extracted

Downloaded from jla.sagepub.com at East Carolina University on April 22, 2015

8

Journal of Laboratory Automation 

Table 5.  Performance on the Independent-Test Data of the S. cerevisiae Data Set. Evaluating Methods    1  2  3  4  5  6  7  8  9 10 mean Guo

Accuracy

Sensitivity

Specificity

F-Measure

MCC

G-Mean

0.8558 0.8596 0.8586 0.8526 0.8582 0.8582 0.8591 0.8547 0.8518 0.8574 0.8566±0.0026 0.7865±0.0030

0.7859 0.7842 0.7896 0.7782 0.7902 0.7855 0.7891 0.7849 0.7737 0.7846 0.7846±0.0049 0.6485±0.0044

0.8958 0.9043 0.8981 0.8960 0.8969 0.9007 0.8994 0.8945 0.8983 0.8998 0.8984±0.0027 0.8500±0.0034

0.7986 0.8063 0.8026 0.7954 0.8017 0.8033 0.8037 0.7969 0.7957 0.8021 0.8006±0.0035 0.7219±0.0029

0.6866 0.6970 0.6927 0.6808 0.6915 0.6930 0.6942 0.6840 0.6803 0.6911 0.6891±0.0055 0.5171±0.0048

0.8451 0.8524 0.8482 0.8432 0.8473 0.8495 0.8494 0.8436 0.8439 0.8485 0.8471±0.0029 0.7929±0.0025

A

B 0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

y axis

y axis

0.9

Comparision of the performance on training dataset

0.4

0.4 0.3

0.3

0.2

0.2 PCH and WT AC

0.1 0

Comparision of the performance on testing dataset

ACC

SEN

SPEC Fm x axis

MCC

G-mean

PCH and WT AC

0.1 0

ACC

SEN

SPEC Fm x axis

MCC

G-mean

Figure 4.  Comparison of the proposed predictor with Guo’s predictor (A) on the training data set; and (B) on the testing data set. The value shown is the mean value of 10 times.

features of our model is only 280, which is less than the AC model has (with 420 features). So we use less features and computational costs, but get a better performance. In this section, we compared the results of the proposed method with those of the existing methods on the H. pylori data set. The results of 10-fold cross-validation over several different methods11,21,41–44 on the H. pylori data set are shown in Table 6. In Boch and Gough’s approach,41,45 several structural and physiochemical descriptors with SVM as the classifier were used to predict PPIs. And, in the method of Martin et al.,21 a novel descriptor called a signature product was developed, which is a product of subsequence and an expansion of a signature descriptor from chemical informatics to infer PPIs. Nanni developed a PPI predictor base on a K-local hyperplane.44 In Nanni and Lumini’s paper,42 they developed an ensemble of K-local hyperplanes for predicting PPIs. In another article, Nanni43 designed a feature vector based on 2 g, and then input it into linear discriminant classifiers for the prediction of PPIs. Nanni and Lumini42 fused some hyperplane

distance nearest-neighbor classifiers to identify PPIs. Xia11 developed a sequenced-based predictor based on an autocorrelation descriptor and rotation forests. We can observe that our method clearly achieves the best results for accuracy and precision compared to the other four approaches. Only the sensitivity was slightly lower than with Xia’s methods. The results for the two data sets showed that the proposed predictor was a useful supplementary tool for PPI prediction.

Conclusion In this work, a new PPI prediction model is proposed that uses only the primary sequences of proteins. The protein features are extracted by using the physicochemical descriptor and DWT, and random forests algorithm is used for prediction. We evaluate the model on large-scale test data. The prediction results clearly show that our model is effective in PPI prediction. Furthermore, fewer features are used in the

Downloaded from jla.sagepub.com at East Carolina University on April 22, 2015

9

Jia et al. Table 6.  Comparison of State-of-the-Art Methods with 10-Cross-Validation. Methods a

Bock and Gough Martinb Nannic Nannid Nanni and Luminie Xiaf Our methodg

Sensitivity

Precision

Accuracy

0.698 0.799 0.806 0.860 0.867 0.882 0.867

0.802 0.857 0.851 0.840 0.850 0.892 0.910

0.758 0.834 0.830 0.840 0.866 0.884 0.887

Prec = TP / (TP + FP). a Results obtained by 10-cross-validation for the predictor by Bock et al.41 on the H. pylori data set. See the “Evaluation Measures” section for further explanation of 10-cross-validation. b Results obtained by 10-cross-validation for the predictor by Martin et al. 21 on the H. pylori data set. c Results obtained by 10-cross-validation for the predictor by Nanni 43 on the H. pylori data set. d Results obtained by 10-cross-validation for the predictor by Nanni 44 on the H. pylori data set. e Results obtained by 10-cross-validation for the predictor by Nanni et al.42 on the H. pylori data set. f Results obtained by 10-cross-validation for the predictor by Xia et al.11 on the H. pylori data set. g Results obtained by 10-cross-validation for our current predictor on the H. pylori data set.

model, but better performance can be achieved. The PPI predictor is available on a public server (http://www.jcibioinfo.cn/PPI_RF). Declaration of Conflicting Interests The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the National Nature Science Foundation of China (Nos. 61261027, 61262038, 31260273, and 61202313); the Natural Science Foundation of Jiangxi Province, China (Nos. 20122BAB211033, 20122BAB201044, and 20132BAB201053); the Scientific Research Plan of the Department of Education of Jiangxi Province (GJJ14640); and the Young Teacher Development Plan of the Visiting Scholars Program, University of Jiangxi Province. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Reference 1. Fields, S.; Song, O. A Novel Genetic System to Detect Protein-Protein Interactions. Nature. 1989, 340, 245–246. 2. Ito, T.; Chiba, T.; Ozawa, R.; et al. A Comprehensive TwoHybrid Analysis to Explore the Yeast Protein Interactome. Proc. Natl. Acad. Sci. USA. 2001, 98, 4569–4574. 3. Gavin, A-C.; Bosche, M.; Krause, R.; et al. Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes. Nature. 2002, 415, 141–147. 4. Ho, Y.; Gruhler, A.; Heilbut, A.; et al. Systematic Identification of Protein Complexes in Saccharomyces cerevisiae by Mass Spectrometry. Nature. 2002, 415, 180–183. 5. Zhu, H.; Bilgin, M.; Bangham, R.; et al. Global Analysis of Protein Activities Using Proteome Chips. Science. 2001, 293, 2101–2105.

6. Han, J-D. J.; Dupuy, D.; Bertin, N.; et al. Effect of Sampling on Topology Predictions of Protein-Protein Interaction Networks. Nat. Biotechnol. 2005, 23, 839–844. 7. Anfinsen, C. B. Principles That Govern the Folding of Protein Chains. Science. 1973, 181, 223–230. 8. Gomez, S. M.; Noble, W.S.; Rzhetsky, A. Learning to Predict Protein-Protein Interactions from Protein Sequences. Bioinformatics. 2003, 19, 1875–1881. 9. Shen, J.; Zhang, J.; Luo, X.; et al. Predicting Protein-Protein Interactions Based Only on Sequences Information. Proc. Natl. Acad. Sci. USA. 2007, 104, 4337–4341. 10. Guo, Y.; Yu, L.; Wen, Z.; et al. Using Support Vector Machine Combined with Auto Covariance to Predict ProteinProtein Interactions from Protein Sequences. Nucleic Acids Res. 2008, 36, 3025–3030. 11. Xia, J-F.; Han, K.; Huang, D-S. Sequence-Based Prediction of Protein-Protein Interactions by Means of Rotation Forest and Autocorrelation Descriptor. Protein Pep. Lett. 2010, 17, 137–145. 12. Xia, J-F.; Zhao, X-M.; Huang, D-S. Predicting Protein-Protein Interactions from Protein Sequences Using Meta Predictor. Amino Acids. 2010, 39, 1595–1599. 13. Yang, L.; Xia, J-F.; Guim J. Prediction of Protein-Protein Interactions from Protein Sequence Using Local Descriptors. Protein Pep. Lett. 2010, 17, 1085–1090. 14. Pellegrini, M.; Marcotte, E. M.; Thompson, M. J.; et al. Assigning Protein Functions by Comparative Genome Analysis: Protein Phylogenetic Profiles. Proc. Natl. Acad. Sci. USA. 1999, 96, 4285–4288. 15. Pazos, F.; Valencia, A. Similarity of Phylogenetic Trees as Indicator of Protein-Protein Interaction. Protein Eng. 2001, 14, 609–614. 16. Overbeek, R.; Fonstein, M.; D’Souza M.; et al. Use of Contiguity on the Chromosome to Predict Functional Coupling. In Silico Bio. 1999, 1, 93–108. 17. Enright, A. J.; Iliopoulos, I.; Kyrpides, N. C.; et al. Protein Interaction Maps for Complete Genomes Based on Gene Fusion Events. Nature. 1999, 402, 86–90.

Downloaded from jla.sagepub.com at East Carolina University on April 22, 2015

10

Journal of Laboratory Automation 

18. Marcotte, E. M.; Pellegrini, M.; Ng, H-L.; et al. Detecting Protein Function and Protein-Protein Interactions from Genome Sequences. Science. 1999, 285, 751–753. 19. Huang, T-W.; Tien, A-C.; Huang, W-S.; et al. POINT: A Database for the Prediction of Protein-Protein Interactions Based on the Orthologous Interactome. Bioinformatics. 2004, 20, 3273–3276. 20. Espadaler, J.; Romero-Isart, O.; Jackson, R. M.; et al. Prediction of Protein-Protein Interactions Using Distant Conservation of Sequence Patterns and Structure Relationships. Bioinformatics. 2005, 21, 3360–3368. 21. Martin, S.; Roe, D.; Faulon, J-L. Predicting Protein-Protein Interactions Using Signature Products. Bioinformatics. 2005, 21, 218–226. 22. Chou, K-C.; Cai, Y-D. Predicting Protein-Protein Interactions from Sequences in a Hybridization Space. J. Proteome Res. 2006, 5, 316–322. 23. Daubechies, I. The Wavelet Transform, Time-Frequency Localization and Signal Analysis. IEEE T. Inform. Theory. 1990, 36, 961–1005. 24. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. 25. Salwinski, L.; Miller, C. S.; Smith, A. J.; et al. The Database of Interacting Proteins. Nucleic Acids Res. 2004, 32, D449–D451. 26. Li, W.; Godzik, A. CD-HIT: A Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences. Bioinformatics. 2006, 22, 1658–1659. 27. Ben-Hur, A.; Noble, W. S. Choosing Negative Examples for the Prediction of Protein-Protein Interactions. BMC Bioinformatics. 2006, 7, S2. 28. Tanford, C. Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins. J. Amer. Chem. Soc. 1962, 84, 4240–4247. 29. Hopp, T. P.; Woods, K. R. Prediction of Protein Antigenic Determinants from Amino Acid Sequences. Proc. Natl. Acad. Sci. USA. 1981, 78, 3824–3828. 30. Krigbaum, W. R.; Komoriya, A. Local Interactions as a Structure Determinant for Protein Molecules. BBA-Protein Struct. 1979, 576, 204–228.

31. Grantham, R. Amino Acid Difference Formula to Help Explain Protein Evolution. Science. 1974, 185, 862–864. 32. Charton, M.; Charton, B. I. The Structural Dependence of Amino Acid Hydrophobicity Parameters. J. Theor. Biol. 1982, 99, 629–644. 33. Rose, G. D.; Geselowitz, A. R.; Lesser, G. J.; et al. Hydrophobicity of Amino Acid Residues in Globular Proteins. Science. 1985, 229, 834–838. 34. Zhou, P.; Tian, F.; Li, B.; et al. Genetic Algorithm-Based Virtual Screening of Combinative Mode for Peptide/Protein. Acta Chim. Sinica. 2006, 64, 691–697. 35. Mallat, S. G. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE T. Pattern Anal. 1989, 11, 674–693. 36. Mallat, S. A Wavelet Tour of Signal Processing. Academic Press: New York, 1999. 37. Qiu, J-D.; Sun, X-Y.; Suo, S-B.; et al. Predicting Homo Oligomers and Hetero-Oligomers by Pseudo-Amino Acid Composition: An Approach from Discrete Wavelet Transformation. Biochimie. 2011, 93, 1132–1138. 38. Mitchell, T. M. Machine Learning. McGraw Hill: New York, 1997. 39. Matthews, B. W. Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme. BBA-Protein Struct. 1975, 405, 442–451. 40. Jia, J.; Xiao, X.; Liu, B.; et al. Bagging-Based Spectral Clustering Ensemble Selection. Pattern. Recogn. Lett. 2011, 32, 1456–1467. 41. Bock, J. R.; Gough, D. A. Whole-Proteome Interaction Mining. Bioinformatics. 2003, 19, 125–134. 42. Nanni, L.; Lumini, A. An Ensemble of K-Local Hyperplanes for Predicting Protein-Protein Interactions. Bioinformatics. 2006, 22, 1207–1210. 43. Nanni, L. Fusion of Classifiers for Predicting Protein-Protein Interactions. Neurocomputing. 2005, 68, 289–296. 44. Nanni, L. Hyperplanes for Predicting Protein-Protein Interactions. Neurocomputing. 2005, 69, 257–263. 45. Bock, J. R.; Gough, D. A. Predicting Protein-Protein Interactions from Primary Structure. Bioinformatics. 2001, 17, 455–460.

Downloaded from jla.sagepub.com at East Carolina University on April 22, 2015

Prediction of Protein-Protein Interactions with Physicochemical Descriptors and Wavelet Transform via Random Forests.

Protein-protein interactions (PPIs) provide valuable insight into the inner workings of cells, and it is significant to study the network of PPIs. It ...
1MB Sizes 0 Downloads 7 Views