Journal of Theoretical Biology 367 (2015) 278–285

Contents lists available at ScienceDirect

Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi

Communities in the iron superoxide dismutase amino acid network Yanrui Ding a,b,n, Xueqin Wang a, Zhaolin Mou a a b

School of Digital Media, Jiangnan University, Wuxi, Jiangsu, 214122, P. R. China Key Laboratory of Industrial Biotechnology, Jiangnan University, Wuxi, Jiangsu, 214122, P. R. China

art ic l e i nf o

a b s t r a c t

Article history: Received 6 May 2014 Received in revised form 24 November 2014 Accepted 28 November 2014 Available online 9 December 2014

Amino acid networks (AANs) analysis is a new way to reveal the relationship between protein structure and function. We constructed six different types of AANs based on iron superoxide dismutase (Fe-SOD) three-dimensional structure information. These Fe-SOD AANs have clear community structures when they were modularized by different methods. Especially, detected communities are related to Fe-SOD secondary structures. Regular structures show better correlations with detected communities than irregular structures, and loops weaken these correlations, which suggest that secondary structure is the unit element in Fe-SOD folding process. In addition, a comparative analysis of mesophilic and thermophilic Fe-SOD AANs’ communities revealed that thermostable Fe-SOD AANs had more highly associated community structures than mesophilic one. Thermophilic Fe-SOD AANs also had more high similarity between communities and secondary structures than mesophilic Fe-SOD AANs. The communities in Fe-SOD AANs show that dense interactions in modules can help to stabilize thermophilic Fe-SOD. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Amino acid network Community Fe-SOD Thermostability

1. Introduction Under natural conditions, protein typically adopts a native compact three-dimensional form. There are many kinds of interactions among amino acids sustaining this form. These interactions mainly include electrostatic interaction, hydrogen bond, disulfide bond and Van der Waals force. Obviously, protein structure can be taken as amino acid network with these interactions. Studying the relationship of protein sequence-structure-function from the perspective of amino acid networks (AANs) using graph theory could be a promising area of research (Aftabuddin and Kundu, 2007). As we know, using graphical approaches to study biological problems can provide an intuitive picture for helping analyzing complicated relations in these systems (Lin and Lapointe, 2013). This can be demonstrated by many previous studies on a series of important biological topics, such as enzyme-catalyzed reactions (Andraos, 2008; Chou and Forsen, 1980; Zhou and Deng, 1984), inhibition of HIV-1 reverse transcriptase (Althaus et al., 1993; Althaus et al., 1994), drug metabolism systems (Chou, 2010) and protein-protein interactions (Kurochkina and Choekyi, 2011; Zhou, 2011a; Zhou and Huang, 2013; Zhou, 2011b) studied using wenxiang diagram or graph (Maggiora, 1997). In AANs, the nodes represent amino acids and the edges represent the interactions between amino acids. There are two kinds of AANs,

n Correspondence to: Lihu Avenue 1800, School of Digital Media, Jiangnan University, Wuxi, 214122, P. R. China. Tel.: þ86 510 85916500. E-mail address: [email protected] (Y. Ding).

http://dx.doi.org/10.1016/j.jtbi.2014.11.030 0022-5193/& 2014 Elsevier Ltd. All rights reserved.

unweighted network and weighted network. In unweighted network, C p is considered the center of amino acid. It is assumed that two C p atoms interact with each other if the distance between them is below a certain cutoff (Barabasi and Oltvai, 2004; Boede et al., 2007; Estrada, 2010; Greene and Higman, 2003), there exists an edge between these two C p atoms. In weighted network, the edge weight is the number of distant atom pairs between the side chains of amino acids when the distance between any two atoms is smaller than 4.5 Å (Barabasi and Oltvai, 2004; Boede et al., 2007; Estrada, 2010; Greene and Higman, 2003). Based on the physicochemical properties of amino acids, Aftabuddin and Kundu (Aftabuddin and Kundu, 2007) constructed hydrophilic, hydrophobic and charged AANs and analyzed a series of network properties. Using the normalized strength of interactions between amino acids, Brinda and Vishveshwara (Brinda and Vishveshwara, 2005) proposed a novel type of unweighted network. They suggested setting critical cutoffs for different sizes of AANs and found hubs and factors in AANs related to thermostability. In addition, Jiao et al. (Jiao et al., 2013, 2007) constructed a new type of weighted network based on energy of interactions between amino acids, which sets a foundation for studying the critical residues involved in the folding process and the active sites and for building the evolution model of proteins. To date, lots of network properties in AANs have been investigated (Aftabuddin and Kundu, 2007), including average degree, cluster coefficient, characteristic path length and Pearson coefficient. The global behavior of AANs is analyzed based on degree distribution, cluster coefficient and characteristic path length (Aftabuddin

Y. Ding et al. / Journal of Theoretical Biology 367 (2015) 278–285

and Kundu, 2007; Brinda et al., 2010; Petersen et al., 2012). It is found that AANs, which fall between random networks and regular networks, are complex networks with ‘small-world’ properties. Moreover, network properties such as percolation, clusters, hubs, cliques and communities were also studied at a detailed atomic level, which help to shed light on structure and function factors in protein (Brinda et al., 2010; Petersen et al., 2012; Vijayabaskar and Vishveshwara, 2010). In this article, we compared different detecting community methods and selected the suitable one to modularize the AANs, then analyzed them. Graph partitioning, which is frequently used in computer science and related fields, is a common method to discover communities in networks. Hierarchical clustering, including agglomerative and divisive methods, is widely used in social and biological networks. Especially, GN divisive method based on edge-betweenness (Girvan and Newman, 2002; Newman, 2004) is an universally accepted algorithm. When using the GN method, one needs to define the modularity of network, which is additional and non-topological information on the nature of the network, to understand which branch have real significance. It is also a computationally costly method. To address these problems, Radicchi et al. (Radicchi et al., 2004) proposed a selfcontained GN algorithm and a fast divisive algorithm. The selfcontained GN algorithm defined community in strong and weak senses, which gave a general and quantitative definition of community. The fast divisive algorithm used the edge-clustering coefficient instead of edge-betweenness. It is a novel extreme-optimization-ofvalues modularity algorithm, and it is a heuristic search method that improves global modularity by adjusting local variables (Duch and Arenas, 2005). Newman showed that modularity can be expressed as an eigenvector using a method that is elegant and has a short run time (Newman, 2006). The main idea of the “GN” algorithm is that the connected components of the remaining network are the communities after progressively removing edges from the original network. By comparing these commonly used methods, it was observed that the method based on eigenvectors is the most effective and that “GN” algorithm is the most stable method for detecting different communities in networks (Newman, 2006). In addition, Zhang et al. (Zhang et al., 2007) improved upon a fuzzy clustering algorithm and applied it to communities in networks. It is found that the improved fuzzy clustering algorithm and a k-means clustering algorithm also have good performance in detecting communities. Recently, Sun et al. (Sun et al., 2013) detected communities in AANs using the “GN” algorithm and found a good correlation between the detected communities and protein domains. However, the approach they constructed AANs had a considerable effect on the experimental results. Particularly, the selection of critical cutoff and determination of the center of the amino acid affected the result significantly. Moreover, Gaci (Gaci, 2011; Gaci and Balev, 2009) observed that some amino acids tend to group together to form local clouds, and the composition of this organization further indicated that the specific aggregation between loops around a secondary structure core found by the “GN” algorithm. To further study communities in AANs, we selected a group of iron superoxide dismutases (Fe-SODs) for analysis. Fe-SOD is a class of Fe3 þ metalloenzymes of superoxide dismutase, which can prevent aging (Ding et al., 2012). Therefore, Fe-SODs are widely used in the pharmaceutical, cosmetic and food industries. Natural Fe-SODs are susceptible to heat and chemical inactivation during processing and utilization; thus, many researchers have studied Fe-SOD thermostability. Using supported vector machine and principle component analysis, researchers have found some of the proposed indicators of increased thermostability: an increase in interactions between amino acids, a more highly hydrophobic core, higher secondary structure content, deleted or shortened loops, fewer thermolabile residues (Ding et al., 2012; He et al., 2007; Merlino et al., 2010; Taylor and Vaisman, 2010). Otherwise, the ion pairs (Lim et al., 1997; Ursby et al., 1999),

279

hydrogen bonds (Lim et al., 2001) hydrophobicity (Knapp et al., 1999), polar interactions and amino acid weight (Dello Russo et al., 1997) in hyperthermophilic Fe-SODs are significantly higher than those in mesophilic Fe-SODs. Although many factors related to protein thermostability (De Vendittis et al., 2008) have been found, however, some special proteins are still not accurately identified with respect to thermal stability, such as the Fe-SODs in Table S1. Mesophilic Fe-SOD from Helicobacter pylori have more hydrogen bonds, salt bridges and Van der Waals forces than thermophilic one, and the 3/10-helix content of mesophilic Fe-SOD from Escherichia coli is higher than that of thermophilic one. Here, we studied Fe-SOD thermostability from the perspective of communities in AANs and expected to find some factors related to Fe-SOD thermostability. In this article, we constructed six kinds of Fe-SOD AANs based on the information in Fe-SOD three-dimensional structures and used four different methods to detect communities in the Fe-SOD AANs. Then, we analyzed the relationship between the detected community and secondary structure using a novel community detected accuracy criterion. We also compared the communities in mesophilic Fe-SOD AANs with the communities in thermophilic Fe-SOD AANs.

2. Materials and methods 2.1. Data set We selected the data from the PDB database (http://www.rcsb. org/pdb/home/home.do). First, we collected all of the Fe-SOD PDB codes from different prokaryotes and retrieved 33 enzymes in the PDB database. Next, we selected 13 of them by setting the identity to be less than 90% and the resolution to be less than 0.3 nm. The growth temperatures were referred to the DSMZ database (http:// www.dsmz.de/) and citations in the PDB database. The basic information of 13 Fe-SODs was listed in Table S1. 2.2. Constructing the Fe-SOD AANs 2.2.1. Network 1 C p is taken as the center of amino acid. The nodes are C p of amino acids. When the distance of two C p atoms is smaller than 6.5 Å (Barabasi and Oltvai, 2004; Boede et al., 2007; Estrada, 2010; Greene and Higman, 2003), it is regarded that there are interactions between the related amino acids, and the two nodes are connected, that is, there exists an edge between them. It is an unweighted network.

2.2.2. Network 2 Based on network 1, weighted network is constructed. The weight of edge is the number of atom pairs between the side chains of the amino acids when the distance between any two atoms is below 4.5 Å (Barabasi and Oltvai, 2004; Boede et al., 2007; Estrada, 2010; Greene and Higman, 2003).

2.2.3. Network 3 This network is constructed referring to a novel unweighted network based on the normalized strength of interactions between amino acids proposed by Brinda (Brinda and Vishveshwara, 2005).

2.2.4. Network 4 In this network, the node is amino acid (the geometric center of amino acid), the edge weight represents the number of distant atom pairs between the side chains of amino acids when the distance between any two atoms is below 4.5 Å.

280

Y. Ding et al. / Journal of Theoretical Biology 367 (2015) 278–285

2.2.5. Network 5 Based on network 1, this network is constructed based on the energy of the interactions between amino acids (Jiao et al., 2013; Jiao et al., 2007), that is, weighted edge is amino acid-amino acid interaction energy. 2.2.6. Network 6 The nodes are C p of amino acids. The weighted edge is a superposition of intermolecular interactions including the hydrogen bonds, Van der Waals forces (Vijayabaskar and Vishveshwara, 2010) and salt bridges shown in Table S1. Next, we detected the communities in six kinds of networks, defined a new community detected accuracy criterion to study the relationship between communities and Fe-SOD secondary structure and compared them between thermophilic and mesophilic Fe-SODs. 2.3. Detecting communities in the AANs Typically, a community can be taken as a subset of network. Connections in the same community are dense, while connections between any two communities are sparse. In practice, communities in complex networks contain important information. For example, communities in social networks can represent real packet structures corresponding to interests or background grouping, communities in citation networks may represent a theme of related papers, and communities in metabolic networks may represent life-cycles or other functions (Girvan and Newman, 2002; Newman, 2004). Here, we focus on the important role of the communities detected in Fe-SOD AANs. 2.3.1. The methods to detect communities Researchers have proposed a variety of methods to detect communities in networks. According to the results obtained from comparing several of the algorithms in reference (Newman, 2006), the most effective method is based on the eigenvectors of a characteristic matrix of the networks, and the “GN” algorithm is always stable in different types of networks. Here, considering the efficiency and stability of the algorithms, we selected the “GN” algorithm (GN) and method based on the eigenvectors of a characteristic matrix of network (ECM) to detect communities in AANs. In addition, from the perspective of pattern recognition, we selected the improved K-means algorithm (IKM) and the improved fuzzy clustering algorithm C-means (ICM) (Zhang et al., 2007) to detect communities in AANs. Comparing the results of these four methods, we will find the suitable method to detect the communities in Fe-SOD ANNs. Leave-one-out cross-validation method (Ron, 1995) was used for examining the community detected accuracy. 2.3.2. The relationship between communities and secondary structures Sun et al. (Sun et al., 2013) found that communities in AANs are related to protein domains with a highest detected accuracy of 0.8668. Their study has established a foundation for analyzing the role of communities in AANs. Here, we defined a new community detected accuracy criterion to study the relationship between communities and Fe-SOD secondary structure. The secondary structure assignments of proteins are typically obtained from DSSP algorithm (Kabsch and Sander, 1983). We used the secondary structure of Fe-SOD from Pseudoalteromonas haloplanktis (PDB code: 3lio) as an example. The following is the secondary structure of 3lio obtained from DSSP algorithm. B, E, G, H, S, T and X are the beta bridge, beta strand, 3/10-helix, alpha helix, bend, turn and blank structure, respectively. From this symbol sequence, we can know which residue belongs to which kind of secondary structure.

XXXXXXXSSXTTTTTTTSXHHHHHHHTTTHHHHHHHHHHHHXTTSTTTTSX HHHHHTTXXHHHHHHHHHHHHHHHHHHTBXTTXXSSXXHHHHHHHHHHH SSHHHHHHHHHHHHHTXXSSEEEEEEEXTTSXEEEEEEETTXXGGGSTTXEEE EEEEXSGGGTHHHHTTXHHHHHHHHHHHBXHHHHHHHHX. Obviously, residue and its neighbor residues belong to one kind of secondary structures can be taken as a group. In another word, consecutive and identical secondary structure characters in the symbol sequence belong to identical group. For example, the first part of the secondary structure sequence of 3lio can be written as XXXXXXXXXX-SS-X-TTTTTTT-S-X-HHHHHHHHH-T-HHHHHHHHHHHHHHHHHHHH. For the sake of clarity, these groups are numbered as 1111111111111111-22-3–4444444-5–6-777777777-8-9999999999999999999. Thus, the detected accuracy of communities can be defined as follows. The community detected by the algorithm we used, which was named COMMUNITY¼ {C1 , C2 ,C3 ,…,Ct }, has t communities when it reaches its maximum modularity. The groups based on DSSP algorithm are GROUPDSSP ¼{G1 , G2 , G3 , …, Gp }, which contains p groups. In practice, t is much smaller than p when the modularity of the network reaches a maximum (t«p). Ideally, for any Gi , there always exists an parent set Cj that contains Gi . Because of t«p, we can only find the nearest Cj that contains Gi. Thus, we compute the community detected accuracy using the following formula. Accuracy ¼ N TD =N, where N TD is the number of residues in whole Fe-SOD accurately detected, and N is the scale of the network (or the total residue numbers in Fe-SOD sequence). For the secondary structure detected accuracy, NTD is the number of residues within one kind of secondary structures detected accurately and N is the total residue numbers of one kind of secondary structures in Fe-SOD. Here, we use a simple example to illustrate how to compute the accuracy. For secondary structure sequence from DSSP algorithm HHHHHHHHHHHHHHH-XXXXXXXXXXXXXX-HHHHHHHHHHHH As we mentioned, consecutive and identical secondary structure characters in secondary structure sequence belong to identical group. There are three groups named G1 (111111111111111), G2 (22222222222222) and G3 (333333333333). Then we numbered it as following. 111111111111111-22222222222222-333333333333 Next, the communities are detected from amino acid network constructed based on its three-dimensional structure and there are three communities (as shown in Fig. 1), named community 1 (C1), community 2 (C2) and community 3(C3). From Fig. 1, we know residues in G1, G2 or G3 are related to nodes in different communities. Taking G1 as an example, ten residues are in C1, two residues are in C2, and three residues are in C3. Apparently, the group G1 corresponds to community 1. Then, ten residues are detected correctly in G1. Similarly, we counted the correctly detected residues in G2 and G3. The numbers are 8 and 10 respectively. Then, we use the formula Accuracy ¼ NTD =N to compute the community detected accuracy, here N TD ¼10þ8þ10, N¼ 41, then the Accuracy¼28/ 41¼0.683; for the secondary structure detected accuracy (here take H structure as an example), NTD ¼10þ 10, N¼27, then the Accuracy¼20/27¼0.741, so the helix detected accuracy is 0.741.

3. Results and discussion 3.1. Communities in Fe-SOD AANs 3.1.1. Modularity of Fe-SOD AANs A good partition of a network into communities must contain as many as possible within-community links and as few as possible

Y. Ding et al. / Journal of Theoretical Biology 367 (2015) 278–285

281

Fig. 1. An example to illustrate the relationship between communities and secondary structures.

Fig. 2. The average modularity of Fe-SOD AANs. The modularity in ordinates is the average of six types of AANs, and the four colors represent the different methods used.

between-community links. Based on the definition of modularity of a network by Newman and Girvan [12], the modularity is computed as the fraction of the edges that lie within communities rather than between communities, relative to the fraction that is expected by chance. The last row in Table S2 showed that the average modularity changed from 0.308 to 0.816. Most average modularity are larger than 0.5 and the average value of all networks by all methods is 0.622, which means that Fe-SOD AANs have clear community structure (Girvan and Newman, 2002). The standard deviation in italics in Table S2 is the largest value for one kind of network. Obviously, most values in italics are from IKM method, followed by ECM method. That means ECM and IKM are unstable, as seen in Fig. 2. Comparison of standard deviation between ICM and GN shows that the standard deviation by GN is smaller than by ICM in most cases. In addition, the average numbers of communities detected by ICM are from 3 to 6, while the numbers by GN are from 9 to 13 (the detailed data not shown). In practice, the division of ANNs by GN is more reasonable. Further considering GN method is always stable and effective in detecting communities in different types of networks (Newman, 2006), we selected GN as a suitable method to detect the communities in Fe-SOD AANs. Next, we focus on the modularity computed by GN method. Obviously, whether the modularity of each Fe-SOD or the average modularity of each network, network 5 and network 6 have higher values than other networks and Network 6 is the highest one in all the networks. That

indicates Network 6 has the most stable communities in all of the AANs studied, because the greater modularity means that the network has more stable communities (Girvan and Newman, 2002).

3.1.2. The community detected accuracy of Fe-SOD AANs As we mentioned in method part, because of t«p, there exists several relationships between secondary structures and communities. The first case is that a group (such as H4 in Fig. 3) is totally included in one community, and this community only contains this group. The second case is that a group is located in more than one community, if this case always happens, that means this kind of network cannot describe the Fe-SOD three-dimensional structure accurately. The third case is several secondary structures (sometime it is a super secondary structure) located in one community. Actually, this is the common case. In our example, some secondary structures, such as H4, H5, H6, H8 and H9 (the corresponding residues in the top, left side and the bottom of Fig. 3), reside totally inside identical communities. As seen in Fig. 3, the communities in dark green, in yellow, in gray in left bottom, in dark cyan, and in cyan are composed of all amino acids in H4, H5, H6, H8 and H9, respectively. However, most communities contain several secondary structures, such as (S1, T1, S2), (H1, T2, H2), (T3, S3, T4, S4) and (S10, G2, H7, T10). It proves that secondary structures work together to be responsible for some functions. The average community detected accuracies of all kinds of networks by

282

Y. Ding et al. / Journal of Theoretical Biology 367 (2015) 278–285

Fig. 3. The relationship between communities and secondary structures of Fe-SOD from Pseudoalteromonas haloplanktis. The boxes in same color represent residues grouped into one community. The side figures are the Fe-SOD primary sequence and secondary structures (the order of right-top-left-bottom indicates the direction of protein sequence is from N to C terminal). The number beside each box is the residue number in primary sequence.

GN in Table S3 and the average of secondary structure detected accuracies in Table S4 are all larger than 0.8. Fig. 4 also showed that most average secondary structure detected accuracies of all Fe-SODs in all network are above 0.75. Interestingly, except beta strand, the larger values of all other kinds of secondary structures’ detected accuracies are from network 5 and 6, which means network 5 and network 6 can describe Fe-SOD structure very well. All these results also show a good correlation between the detected communities and secondary structures (Yang et al., 2009).

Fe-SODs. There exists a contradiction between global and local correlation: The dense interactions inner highly thermally stable Fe-SOD secondary structures strengthen their correlations between the detected communities and secondary structure (Bhattacharjee and Biswas, 2009; Ding et al., 2012; Girvan and Newman, 2002; Taylor and Vaisman, 2010); however, thermophilic Fe-SODs also have many interactions between secondary structures that may weaken the local correlations between the detected communities and secondary structure (Ding et al., 2012; Girvan and Newman, 2002; Taylor and Vaisman, 2010).

3.2. Comparison of communities in mesophilic and thermophilic Fe-SOD AANs 4. Conclusion Next, we compared the communities of mesophilic Fe-SOD AANs with thermophilic one. In Fig. 5A, most D1 values of thermophilic FeSOD are larger than 0, while most D1 value of mesophilic Fe-SOD are less than 0. Thus, the modularity of thermophilic Fe-SOD AANs is greater than mesophilic one, and thermophilic Fe-SOD AANs have better communities than mesophilic Fe-SOD AANs. In Fig. 5C, most D3 value of thermophilic Fe-SOD are greater than mesophilic Fe-SOD, which means that the correlation between the detected communities and the secondary structure of thermophilic Fe-SODs is more accurate than the correlation between mesophilic Fe-SODs. However, in Fig. 5B, most D2 value of the mesophilic Fe-SOD are higher than thermophilic Fe-SOD, which shows that mesophilic Fe-SODs have more highly similar agreements with global AANs than thermophilic

ANNs is a promising method to denote protein structure. Analysis of protein structure from the perspective of complex network, not only can verify existed theories, but also can propose some underlying assumptions, which also promotes biological and technological development. In this article, we studied the communities in Fe-SOD AANs. We constructed six types of Fe-SOD AANs based on the information contained in three-dimensional structures. We found that Fe-SOD AANs contain clear community structures and that there exists solid correlation between these detected communities and secondary structures. By comparing the communities in mesophilic and thermophilic Fe-SOD AANs, we found that thermophilic

Y. Ding et al. / Journal of Theoretical Biology 367 (2015) 278–285

283

Fig. 4. Secondary structure detected accuracies of Fe-SOD AANs. Each value is the average of 13 Fe-SODs. B, E, G, H, S, T and X are the beta bridge, beta strand, 3/10-helix, alpha helix, bend, turn and blank structure, respectively.

Fig. 5. Compare communities in Fe-SOD AANs with different thermostability. Figure A is the comparison of the modularity in mesophilic and thermophilic ANNs; Figure B is the comparison of community detected accuracy in mesophilic and thermophiles ANNs; Figure C is the comparison of secondary structure detected accuracies in mesophilic and thermophilic ANNs. In each subgraph, the horizontal axis represents 13 Fe-SODs, where 1  7 are mesophilic Fe-SODs including 1bsm, 1dt0, 1ids, 1 mmm, 2w7w, 3cei and 3lio. 8  13 are thermophilic Fe-SODs including 1b06, 1coj, 1dqi, 1ma1, 1my6 and 1wb7. The vertical axes indicate D1, D2 and D3. Where, D1mesophilic FeSOD ¼ Modularitymesophilic FeSOD Averagethermophlic FeSODs ; D1thermophlic FeSOD ¼ Modularitythermophlic FeSOD  Averagemesophilic FeSODs ; D2mesophilic FeSOD ¼ Accuracymesophilic FeSOD  Averagethermophlic FeSODs ; D2thermophlic FeSOD ¼ Accuracythermophlic FeSOD  Averagemesophilic FeSODs ; D3mesophilic FeSOD ¼ Accuracymesophilic FeSOD  Averagethermophlic FeSODs ; D3thermophlic FeSOD ¼ Accuracythermophlic FeSOD  Averagemesophilic FeSODs :

Fe-SOD AANs had more stable communities than mesophilic Fe-SOD AANs and thermophilic Fe-SODs had a more highly accurate correlation between the detected communities and secondary

structures than mesophilic Fe-SOD AANs. These results indicate that thermophilic Fe-SOD secondary structures have dense inner interactions that are able to strengthen the community structure

284

Y. Ding et al. / Journal of Theoretical Biology 367 (2015) 278–285

of AANs. We will apply this method to a common protein's structure-function study in future research. Since user-friendly and publicly accessible web-servers represent the future direction for developing practical and more useful models, analysis methods, or predictors as reflected by a series of recent publications (Chen et al., 2014a, 2014b; Ding et al., 2014; Guo et al., 2014; Liu et al., 2014; Qiu et al., 2014; Xu et al., 2014), we will make efforts in our future work to provide a web-server for the method presented in this article.

Acknowledgments This work was supported by the National Natural Science Foundation of China (21001053) and the Project Sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry.

Appendix A. Supporting information Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.jtbi.2014.11.030. References Aftabuddin, M., Kundu, S., 2007. Hydrophobic, hydrophilic, and charged amino acid networks within protein. Biophys. J. 93 (1), 225–231. Althaus, I.W., Gonzales, A.J., Chou, J.J., Romero, D.L., Deibel, M.R., Chou, K.C., Kezdy, F.J., Resnick, L., Busso, M.E., So, A.G., 1993. The quinoline U-78036 is a potent inhibitor of HIV-1 reverse transcriptase. J. Biol. Chem. 268 (20), 14875–14880. Althaus, I.W., Chou, J.J., Gonzales, A.J., Deibel, M.R., Chou, K.C., Kezdy, F.J., Romero, D.L., Thomas, R.C., Aristoff, P.A., Tarpley, W.G., 1994. Kinetic studies with the non-nucleoside human immunodeficiency virus type-1 reverse transcriptase inhibitor U-90152E. Biochem. Pharmacol. 47 (11), 2017–2028. Andraos, J., 2008. Kinetic plasticity and the determination of product ratios for kinetic schemes leading to multiple products without rate laws – New methods based on directed graphs. Can. J. Chem. 86, 342–357. Barabasi, A.L., Oltvai, Z.N., 2004. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5 (2), 101–113. Bhattacharjee, N., Biswas, P., 2009. Structural patterns in alpha helices and beta sheets in globular proteins. Protein Peptide Lett. 16 (8), 953–960. Boede, C., Kovacs, I.A., Szalay, M.S., Palotai, R., Korcsmaros, T., Csermely, P., 2007. Network analysis of protein dynamics. FEBS Lett. 581 (15), 2776–2782. Brinda, K.V., Vishveshwara, S., 2005. A network representation of protein structures: Implications for protein stability. Biophys. J. 89 (6), 4159–4170. Brinda, K.V., Vishveshwara, S., Vishveshwara, S., 2010. Random network behaviour of protein structures. Mol. Biosyst. 6 (2), 391–398. Chen, W., Feng, P.-M., Lin, H., Chou, K.-C., 2014a. iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. Biomed Res. Int 2014 (12), 623149. Chen, W., Lei, T.-Y., Jin, D.-C., Lin, H., Chou, K.-C., 2014b. PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition. Anal. Biochem. 456, 53–60. Chou, K.-C., 2010. Graphic rule for drug metabolism systems. Curr. Drug Metab. 11 (4), 369–378. Chou, K.C., Forsen, S., 1980. Graphical rules for enzyme-catalysed rate laws. Biochem. J. 187 (3), 829–835. De Vendittis, E., Castellano, I., Cotugno, R., Ruocco, M.R., Raimo, G., Masullo, M., 2008. Adaptation of model proteins from cold to hot environments involves continuous and small adjustments of average parameters related to amino acid composition. J. Theor. Biol. 250 (1), 156–171. Dello Russo, A., Rullo, R., Nitti, G., Masullo, M., Bocchini, V., 1997. Iron superoxide dismutase from the archaeon Sulfolobus solfataricus: average hydrophobicity and amino acid weight are involved in the adaptation of proteins to extreme environments. Biochim. Biophys. Acta. 1343 (1), 23–30. Ding, H., Deng, E.-Z., Yuan, L.-F., Liu, L., Lin, H., Chen, W., Chou, K.-C., 2014. iCTXType: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. Biomed. Res. Int 72, 286419. Ding, Y., Cai, Y., Han, Y., Zhao, B., Zhu, L., 2012. Application of principal component analysis to determine the key structural features contributing to iron superoxide dismutase thermostability. Biopolymers 97 (11), 864–872. Duch, J., Arenas, A., 2005. Community detection in complex networks using extremal optimization. Phys. Rev. E Stat. Nonlin. Soft Matter Phys 72 (2Pt2), 027104. Estrada, E., 2010. Universality in protein residue networks. Biophys. J. 98 (5), 890–900.

Gaci, O., 2011. Community structure description in amino acid interaction networks. Interdiscip. Sci 3 (1), 50–56. Gaci, O., Balev, S., 2009. Node degree distribution in amino acid interaction networks. In: Bioinformatics and Biomedicine Workshop, IEEE International Conference, IEEE Computer Society. Computational Structural Bioinformatics Workshop, November 2009, vol. 1 (1), Washington D.C., United States. pp. 107–112. Girvan, M., Newman, M.E.J., 2002. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 (12), 7821–7826. Greene, L.H., Higman, V.A., 2003. Uncovering network systems within protein structures. J. Mol. Biol. 334 (4), 781–791. Guo, S.-H., Deng, E.-Z., Xu, L.-Q., Ding, H., Lin, H., Chen, W., Chou, K.-C., 2014. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30 (11), 1522–1529. He, Y.-Z., Fan, K.-Q., Jia, C.-J., Wang, Z.-J., Pan, W.-B., Huang, L., Yang, K.-Q., Dong, Z.-Y., 2007. Characterization of a hyperthermostable Fe-superoxide dismutase from hot spring. Appl. Microbiol. Biotechnol. 75 (2), 367–376. Jiao, X., Yang, L., Chen, W., M., 2013. A modified amino acid network model contains similar and dissimilar weight. Comput. Math. Method Med 8, 197892. Jiao, X., Chang, S., Li, C.-h., Chen, W.-z., Wang, C.-x., 2007. Construction and application of the weighted amino acid network based on energy. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 75 (5Pt1), 051903. Kabsch, W., Sander, C., 1983. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22 (12), 2577–2637. Knapp, S., Kardinahl, S., Hellgren, N., Tibbelin, G., Schafer, G., Ladenstein, R., 1999. Refined crystal structure of a superoxide dismutase from the hyperthermophilic archaeon Sulfolobus acidocaldarius at 2.2A resolution. J. Mol. Biol. 285 (2), 689–702. Ron, K., 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. P Fourteenth Int. Joint Conf. Artif. Intel 2 (12), 1137–1143. Kurochkina, N., Choekyi, T., 2011. Helix–helix interfaces and ligand binding. J. Theor. Biol. 283 (1), 92–102. Lim, J.H., Yu, Y.G., Han, Y.S., Cho, S., Ahn, B.Y., Kim, S.H., Cho, Y., 1997. The crystal structure of an Fe-superoxide dismutase from the hyperthermophile Aquifex pyrophilus at 1.9A resolution: structural basis for thermostability. J. Mol. Biol. 270 (2), 259–274. Lim, J.H., Hwang, K.Y., Choi, J., Lee, D.Y., Ahn, B.Y., Cho, Y., Kim, K.S., Han, Y.S., 2001. Mutational effects on thermostable superoxide dismutase from Aquifex pyrophilus: understanding the molecular basis of protein thermostability. Biochem. Biophys. Res. Commun. 288 (1), 263–268. Lin, S., Lapointe, J., 2013. Theoretical and experimental biology in one —A symposium in honour of Professor Kuo-Chen Chou's 50th anniversary and Professor Richard Giegé's 40th anniversary of their scientific careers. J. Biomed Sci. Eng 6, 435–442. Liu, B., Zhang, D., Xu, R., Xu, J., Wang, X., Chen, Q., Dong, Q., Chou, K.-C., 2014. Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30 (4), 472–479. Maggiora, G.M., 1997. Disposition of amphiphilic helices in heteropolar environments. Proteins 28 (1), 99–108. Merlino, A., Krauss, I.R., Castellano, I., De Vendittis, E., Rossi, B., Conte, M., Vergara, A., Sica, F., 2010. Structure and flexibility in cold-adapted iron superoxide dismutases: The case of the enzyme isolated from Pseudoalteromonas haloplanktis. J. Struct. Biol. 172 (3), 343–352. Newman, M.E.J., 2004. Fast algorithm for detecting community structure in networks. Phys. Rev. E Stat. Nonlin. Soft Matter Phys 69 (6pt2), 066133. Newman, M.E.J., 2006. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 74 (3pt2), 036104. Petersen, S.B., Neves-Petersen, M.T., Henriksen, S.B., Mortensen, R.J., Geertz-Hansen, H.M., 2012. Scale-free behaviour of amino acid pair interactions in folded proteins. PLoS One 7 (7), e41322. Qiu, W.-R., Xiao, X., Chou, K.-C., 2014. iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int. J. Mol. Sci. 15 (2), 1746–1766. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D., 2004. Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 101 (9), 2658–2663. Sun, J., Jing, R., Wu, D., Zhu, T., Li, M., Li, Y., 2013. The effect of edge definition of complex networks on protein structure identification. Comput. Math. Methods Med 2013, 365410. Taylor, T.J., Vaisman I.I., 2010. Discrimination of thermophilic and mesophilic proteins. BMC Struct. Biol. 10 (Suppl. 1); S5. Ursby, T., Adinolfi, B.S., Al-Karadaghi, S., De Vendittis, E., Bocchini, V., 1999. Iron superoxide dismutase from the archaeon Sulfolobus solfataricus: analysis of structure and thermostability. J. Mol. Biol. 286 (1), 189–205. Vijayabaskar, M.S., Vishveshwara, S., 2010. Interaction energy based protein structure networks. Biophys. J. 99 (11), 3704–3715. Xu, Y., Wen, X., Shao, X.-J., Deng, N.-Y., Chou, K.-C., 2014. iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. Int. J. Mol. Sci. 15 (5), 7594–7610. Yang, J.Y., Peng, Z.L., Yu, Z.G., Zhang, R.J., Anh, V., Wang, D., 2009. Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. J. Theor. Biol. 257 (4), 618–626.

Y. Ding et al. / Journal of Theoretical Biology 367 (2015) 278–285

Zhang, S., Wang, R.-S., Zhang, X.-S., 2007. Identification of overlapping community structure in complex networks using fuzzy C-means clustering. Physica A 374 (1), 483–490. Zhou, G.-P., 2011a. The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein–protein interaction mechanism. J. Theor. Biol. 284 (1), 142–148. Zhou, G.-P., Huang, R.-B., 2013. The pH-triggered conversion of the PrPc to PrPsc. Curr. Top. Med. Chem. 13 (10), 1152–1163.

285

Zhou, G.P., 2011b. The structural determinations of the leucine zipper coiled-coil domains of the cGMP-dependent protein kinase Ialpha and its interaction with the myosin binding subunit of the myosin light chains phosphase. Protein Peptide Lett. 18 (10), 966–978. Zhou, G.P., Deng, M.H., 1984. An extension of Chou's graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways. Biochem. J. 222 (1), 169–176.

Communities in the iron superoxide dismutase amino acid network.

Amino acid networks (AANs) analysis is a new way to reveal the relationship between protein structure and function. We constructed six different types...
2MB Sizes 6 Downloads 31 Views