Chapter 10 Reverse Engineering Transcriptional Gene Networks Vincenzo Belcastro and Diego di Bernardo Abstract The aim of this chapter is a step-by-step guide on how to infer gene networks from gene expression profiles. The definition of a gene network is given in Subheading 1, where the different types of networks are discussed. The chapter then guides the readers through a data-gathering process in order to build a compendium of gene expression profiles from a public repository. Gene expression profiles are then discretized and a statistical relationship between genes, called mutual information (MI), is computed. Gene pairs with insignificant MI scores are then discarded by applying one of the described pruning steps. The retained relationships are then used to build up a Boolean adjacency matrix used as input for a clustering algorithm to divide the network into modules (or communities). The gene network can then be used as a hypothesis generator for discovering gene function and analyzing gene signatures. Some case studies are presented, and an online web-tool called Netview is described. Key words Gene network, Reverse engineering, Mutual information

1  Introduction Since the advent of microarray technologies, computational biologists have been developing approaches to unravel genome-wide biological networks governing cell behavior. The Aristotelian principle that “The whole is more than the sum of its parts” lies at the heart of this effort; scientists have moved away from single gene studies to consider the cellular system as a whole. Figure 1 is a simplified view of biological networks in the cells (genes, proteins, and metabolites). RNA, proteins, and metabolites in a cell cooperate and interact with each other in order to perform complex functions. The network framework allows describing these complex interactions as set of nodes and edges, where nodes are biomolecules, and edges connect nodes that are functionally interacting. In what follows, we describe the inference of transcriptional gene networks (or gene–gene networks) from Gene Expression Profiles (GEPs), an approach also known as “reverse engineering”. Specifically, the algorithm presented in this chapter has been Michael F. Ochs (ed.), Gene Function Analysis, Methods in Molecular Biology, vol. 1101, DOI 10.1007/978-1-62703-721-1_10, © Springer Science+Business Media, LLC 2014

179

180

Vincenzo Belcastro and Diego di Bernardo

Fig. 1 A simplified schema of cell components, and the different type of interaction networks between components

developed for reverse engineering the human and mouse transcriptional gene networks. Hundreds of thousands of GEPs have been generated in the last decade. The algorithm we developed starts by collecting GEPs obtained by a given microarray platform. Affymetrix HG-U133A (human) and Mouse430_2 (mouse) were considered in this work [1]. There are different experimental data repositories that are freely accessible online such as the Gene Expression Omnibus (GEO) [2] and ArrayExpress [3]. In 2010 [4] all the GEPs stored in GEO were imported into ArrayExpress by formatting them according to the MIAME standard [5], a standard for representing metadata information associated to biological experiments (later replaced by the more simple MAGE-TAB standard [6]). Hence, for the data-gathering step, the algorithm relies on the use of ArrayExpress as the main source of GEPs.

2  Materials 2.1  Gene Expression Data

List of GEPs were retrieved by querying ArrayExpress for a specific microarray platform (e.g., Affymetrix GeneChip): http://www.ebi.ac.uk/arrayexpress/browse.html. The output of the query is a list of experiments that can be downloaded in table format. Save the list locally and download all experiments (small shell script needed, see Methods) by installing the wget function available at http://www.gnu.org/software/wget/. Otherwise, a programmatic access is provided as outlined (see Note 1).

Reverse Engineering Transcriptional Gene Networks

181

Downloaded datasets can then be properly formatted using any programming language. In the next section we provide an example that runs the following R code: dir.create("temp") dir.create("tabfiles") zipfiles 

Reverse engineering transcriptional gene networks.

The aim of this chapter is a step-by-step guide on how to infer gene networks from gene expression profiles. The definition of a gene network is given...
532KB Sizes 0 Downloads 0 Views