Methods in Molecular Biology DOI 10.1007/7651_2013_60 © Springer Science+Business Media New York

Analysis and Meta-analysis of Transcriptional Profiling in Human Epidermis Claudia Mimoso, Ding-Dar Lee, Jiri Zavadil, Marjana Tomic-Canic, and Miroslav Blumenberg Abstract Because of its accessibility, skin has been among the first organs analyzed using DNA microarrays; psoriasis, melanomas, carcinomas, chronic wounds, and responses of epidermal keratinocytes in culture have been intensely investigated. Skin has everything: stem cells, differentiation, signaling, inflammation, hereditary diseases, etc. Here we provide step-by-step instructions for bioinformatics analysis of transcriptional profiling of skin. We also present methods for meta-analysis of transcription profiles from multiple contributors, available in public data repositories. Specifically, we describe the use of GCOS and RMAExpress programs for initial normalization and selection of differentially expressed genes and RankProd for metaanalysis of multiple related studies. We also describe DAVID and Lists2Networks programs for annotation of genes, and for statistically relevant identification of over- and underrepresented functional and biological categories in identified gene sets, as well as oPOSSUM for analysis of transcription factor binding sites in the promoter regions of gene sets. This work can serve as a primer for researchers embarking on skinomics, the comprehensive analysis of transcriptional changes in skin. Keywords: Affymetrix, Annotation, Clustering, Epidermal differentiation, Gene sets, Nonparametric, Ontological categories, Skinomics, Transcriptome

1

Introduction This contribution is a follow-up on the article titled: “Comprehensive transcriptional profiling of human epidermis, reconstituted epidermal equivalents and cultured keratinocytes, using DNA microarray chips” (1). In the revision we edited and slightly modified several sections, specifically those dealing with the bench-top isolation and handling of RNA from epidermal sources. We deleted several sections dealing with algorithms that were out-of-date and outdone by more advanced approaches. Most important, we include here a novel primer for meta-analysis of data available in public repositories. The past years witnessed a tremendous increase in transcriptional profiling research; most journals nowadays require that such data be made publicly available as a condition for publication. This has provided a treasure trove for meta-analyses, allowing researchers to integrate their own data into the common fund of knowledge and thereby

Claudia Mimoso et al.

make new discoveries, define new biological relationships and advance scientific understanding. Thus, the methods described allow for “big data” analyses approaches, which has been one of the major initiatives launched by NIH (https://commonfund.nih.gov/ bd2k/index.aspx). The advent of genomics made possible comprehensive and efficient analysis of gene expression. DNA microarrays are an ideal approach for such systematic comparisons because they can simultaneously measure the expression of, potentially, the entire genome (2). DNA microarray is an ordered arrangement of nucleic acid sequences from thousands of different genes arrayed at fixed locations on supports, usually silicon chips or glass microscope slides. DNA chips come in two varieties: printed cDNA and synthetic oligonucleotide. The first, originated by P. Brown at Stanford (3), are often homemade, inexpensive, and two-color, i.e., a treated sample and the control samples can be hybridized and compared on the same chip. They are easy to customize for a specific application. Synthetic oligonucleotide microarrays are commercially available, at a price, but because each gene is probed with multiple oligonucleotides, they tend to be more reliable and require less redundancy. They are one-color, i.e., they probe only one sample per array, but the newer ones contain all the known human genes and are able to identify the splicing variants as well (4). To address the differences in technique, data quality, etc., the microarray community has promulgated a set of guidelines known as “MIAMI” rules (minimal information about microarrays), which must be complied with if a manuscript is to be accepted in a growing list of journals. Bioinformatics is a very fast-moving field and new and improved approaches, types of chips, hardware, software, and data repositories are constantly being developed. To keep up with the field, we find very useful the special “Database issue,” http://nar.oxfordjournals. org/content/vol36/suppl_1/index.dtl published by The Nucleic Acids Research every January. This edition describes the function and role of all molecular biology data repositories, including those that collect microarray data. Another invaluable resource is the growing set of algorithms assembled in Bioconductor http:// www.bioconductor.org/. Particularly important for the work presented here are the Automated Affymetrix Array Analysis Umbrella Package http://www.bioconductor.org/packages/release/bioc/ html/a4.html and the RankProd algorithm http://www.bio conductor.org/packages/release/bioc/html/RankProd.html. Bioconductor packages are freely available, usually well described and annotated, and we find that the developers of the programs are ready to help with trouble-shooting and hand-holding. The epidermis presents the most accessible target and consequently skin was among earliest targets of DNA microarray studies (5). The RNA from the skin surface can even be recovered using an

Analysis and Meta-analysis of Transcriptional Profiling in Human Epidermis

easy, noninvasive procedure named “tape stripping” (6). The large volume of bioinformatics data relevant to skin led to the coinage of the term Skinomics (7). Skinomics DNA microarray studies focused on epidermal differentiation, skin cancers, inflammatory diseases, wound healing, ageing, stem cells, etc. (8). Specific for skin has been a series of studies on the effects of UV light (9–13). Melanomas and basal and squamous cell carcinomas have been intensely investigated using DNA microarrays and so was psoriasis, one of the most common human inflammatory diseases (14, 15). In addition, genomics analyses of chronic, nonhealing wounds, yielded critical information about pathogenesis that are revising clinical approaches to treatment (16–19). The presence of many different cell types in skin, unfortunately, creates difficulties. Potentially very informative, skin samples taken directly from patients differ in proportions of various cell types, sample age and body sites, history of sun exposure, etc. (8). The transcriptional profiling in skin is rapidly expanding. Unavoidably, some of the recommendations in this manuscript will be outdated even by the time it reaches print. Virtually every analysis tool mentioned is associated with a dedicated team of developers and programmers devoted to making them better, more versatile, and user friendly. Therefore, dear readers, if you see an innovation or a new useful approach, please contact us at [email protected] and we will keep an updated running manual of the described procedures. Conversely, if you embark on transcriptional profiling in skin and would like to find out about new developments, or just need some hand-holding, do not hesitate to get in touch.

2

Materials

2.1 Growth and Isolation of Keratinocytes

1. Normal epidermal keratinocytes or skin samples from surgery. Human skin samples are obtained from patients undergoing elective breast reduction surgery, usually within 2–6 h after surgery. 2. Serum-free keratinocyte growth medium supplemented with 0.05 mg/ml bovine pituitary extract, 5 ng/ml epidermal growth factor, and 1 % penicillin/streptomycin (KGM from Gibco-BRL). 3. Trypsin, adjusted to 0.025 % (Gibco-BRL). 4. Trypsin inhibitor, 0.5 mg/ml in PBS (Sigma). 5. Reconstituted Human Epidermis (SkinEthic Laboratory, Nice, France). 6. Dispase (2.4 U/ml, Roche). 7. RNAse inhibitor (4 U/ml Roche). 8. RNAlater (Ambion).

Claudia Mimoso et al.

2.2 Separation of Epidermal Layers and Isolation of Basal Layer Keratinocytes

1. PBS, phosphate-buffered saline. 2. Solution of 0.05 % trypsin, 0.02 % EDTA (GibcoBRL). 3. Solution of 0.5 mg/ml trypsin inhibitor (Sigma). 4. Cell Strainer tissue filters (Falcon). 5. Magnetic beads, M-450 Rat anti-Mouse-IgG1, prepared as suggested by the manufacturer (Dynal). 6. Antibody 3E1, which binds integrin β4 from (GibcoBRL).

2.3 Isolation of Total RNA, Preparation of Labeled Probes, and Hybridization

1. Qiashredders and on-column RNAses-free DNAse Set (Qiagen). 2. RNeasy kits (Qiagen). 3. Trizol (Invitrogen). 4. RNAlater (Ambion). 5. Affymetrix microarrays.

2.4

URL Sites Listed

1. http://rmaexpress.bmbolstad.com/ 2. http://david.abcc.ncifcrf.gov/ 3. http://rana.stanford.edu/software/ 4. http://www.tm4.org/ 5. http://www.pangloss.com/seidel/Protocols/venn.cgi 6. http://www.cisreg.ca/cgi-bin/oPOSSUM/opossum/ 7. http://nar.oxfordjournals.org/content/vol36/suppl_1/index.dtl 8. http://www.bioconductor.org/ 9. http://www.bioconductor.org/packages/release/bioc/html/ a4.html/ 10. http://www.bioconductor.org/packages/release/bioc/html/ RankProd.html 11. http://www.skinethic.com/ 12. http://www.mattek.com/ 13. http://www.ncbi.nlm.nih.gov/pubmed/ 14. http://www.ebi.ac.uk/arrayexpress/ 15. http://amp.pharm.mssm.edu/lachmann/upload/register.php/ 16. http://www.add-ins.com/ 17. http://www.affymetrix.com/support/technical/ libraryfilesmain.affx/ 18. http://www.lgtc.nl/MaRe/ 19. http://www.biomart.org/biomart/martview/39658a1f84f2 d1822db6e184a5cc356e 20. http://sourceforge.net/projects/arrayexpress/files/ 21. https://commonfund.nih.gov/bd2k/index.aspx/

Analysis and Meta-analysis of Transcriptional Profiling in Human Epidermis

3

Methods

3.1 Provenance and Maintenance of Samples

1. Normal epidermal keratinocytes from human foreskin were initiated using 3T3 feeder layers as described (20, 21) and then frozen in liquid N2 until used. Once thawed, the keratinocytes are grown without feeder cells in defined serum-free keratinocyte growth medium supplemented with 0.05 mg/ml bovine pituitary extract, 5 ng/ml epidermal growth factor, and 1 % penicillin/streptomycin (KGM from Gibco-BRL) at 37  C, in 5 % CO2. The medium is replaced every 2 days (Note 1). 2. The reconstituted human epidermis consists of a threedimensional multilayered keratinocyte structure grown on air–liquid interface, without any other cell type. These are available from SkinEthic http://www.skinethic.com/ or MatTek http://www.mattek.com/ (Note 2). The media for cell culture are usually prepared without antibiotics and antimycotic agents (22, 23). 3. Human skin samples are obtained from patients undergoing elective breast reduction surgery, usually within 2–6 h after surgery (Note 3). The fat layer and most of the dermis are removed using surgical scissors and by gentle scrapping with a scalpel, leaving the epidermis as the predominant cellular structure (~0.2 mm deep). Samples are then cut into strips of approximately 0.5  3 cm and stored in RNAlater (Ambion) overnight at 4  C.

3.2 Isolation of β4+ and β4- Keratinocytes from Skin

1. Skin, discarded after reduction mammoplasty, as described above, is first washed six times with PBS and excess liquid drained. Using scissors and a scalpel, fat and dermis is removed as much as possible. The tissue is cut into 3 mm wide strips and incubated with dispase (2.4 U/ml, Roche) and RNAse inhibitor (4 U/ml Roche) at 4 overnight. 2. Next day, the epidermis is gently separated from the dermis using forceps, and incubated in 0.05 % trypsin, 0.02 % EDTA (GibcoBRL) at 37 . After 10 min, two volumes of 0.5 mg/ml trypsin inhibitor (Sigma) is added and the tissue filtered through Cell Strainer (Falcon). The trypsinization of the tissue is repeated twice more. The cells are collected by centrifugation, examined using trypan-blue, counted and, if appropriate, the isolates combined. This represented the unfractionated, total epidermal cell population. 3. Magnetic beads, M-450 Rat anti-Mouse-IgG1, are prepared as suggested by the manufacturer (Dynal). The cells are incubated with the beads in the following ratio: 100 μl beads: 10–20 μg β4 antibody: 4  106 cells (exactly!) in 1 PBS, 0.1 % BSA, at 4 for 1–2 h. We used M-450 Rat anti-MouseIgG1 beads and

Claudia Mimoso et al.

the 3E1 clone β4 antibody from GibcoBRL (24). The beads are separated on a magnetic separator for 2–3 min, washed 3–4 times with PBS, collecting and combining the nonadherent, β4- cells, which represents the suprabasal cell population. The beads bound to the β4+ basal cells are used in RNA isolation without removing the cells from the beads. 3.3 Isolation of Total RNA from Human Epidermis

To obtain RNA of appropriate quality for chip analysis from in vivo epidermis, we have tested several purification methods. After extensive experimentation, we settled on the following approach: 1. First, the epidermal cells are disrupted and the RNA is isolated using Trizol (Gibco). 2. This is followed by the use of Qiashredders to homogenize cell extracts with centrifugation at 1,800  g for 2 min. 3. DNA is removed with on-column DNAse digestion using RNAses-free DNAse Set (Qiagen). RNeasy kits from Qiagen are used to prepare the RNA according to the manufacturer’s protocols (Note 4). If the sample is not immediately processed for RNA isolation, it is cut into 3 mm-wide strips and stored in RNAlater overnight at 4  C, then at 20  C. With this procedure, we routinely prepare RNA of high quality (Note 5). 4. From the reconstituted epidermis and cultured keratinocytes, total RNA is isolated using Qiashredders to homogenize cell extracts, and RNeasy kits procedure. The RNA samples are stored in water at 80  C until hybridization. 5. To ensure good RNA quality, 28S and 18S ribosomal bands are visualized on a nondenaturing agarose gel and OD260/280 spectrophotometric ratio of at least 1.8 is ascertained. Five micrograms of total RNA is reverse transcribed, amplified, and labeled as described (25). 6. Approximately 5–8 μg of total RNA is reverse transcribed, amplified, and labeled as described (9, 26). Labeled cRNA is hybridized to the arrays (Affymetrix), which are washed, stained with anti-biotin streptavidin-phycoerythrin-labeled antibody using Affymetrix fluidics station and then washed again according to the Affymetrix protocol. 7. Arrays are scanned using the Agilent GeneArray Scanner system (Hewlett-Packard) and GeneChip 3.0 software to determine the expression of each gene. A representative picture of a hybridized Affymetrix microarray is shown in Fig. 1. RT-PCR, Northern and Western blot analyses that confirm microarray data will not be described here; any standard molecular biology protocol compilation can be consulted for this purpose.

Analysis and Meta-analysis of Transcriptional Profiling in Human Epidermis

Fig. 1 Affymetrix microarray chip. The left side shows the entire chip; on the right zoom-in shows details of pairs of perfect match and mismatch hybridization, top, and actual pixels, bottom. The picture on the bottom right shows the grid, thin white lines separating the chip features. Note that the rim pixels are discarded and only the 4  5 array of pixels in the middle is averaged to determine the level of hybridization 3.4 Searching for and Finding Relevant Studies in Public Repositories for Meta-analysis

Microarray data usually are deposited into annotated and curated databases. Probably the largest is The National Center for Biotechnology Information (NCBI) Gene Expression Omnibus, GEO (http://ncbi.nlm.nih.gov/geo) (27), which contains close to 1,000,000 gene expression profiles and grows by 150 % annually. Sloughing through such an enormous database to find all relevant studies is somewhat difficult and more of an art than science. The two main data repositories that collect and annotate transcription profiling using microarrays and other high throughput methods (e.g., SAGE and next generation sequencing, NGS), are NIH-GEO, reached at http://www.ncbi.nlm.nih.gov/pubmed, and ArrayExpress at http://www.ebi.ac.uk/arrayexpress/ (Fig. 2). The two resources overlap to a large extent, but there are also significant differences, i.e., microarray studies present in one but not the other collection. This may be partly due in part to different data sets in each database, and partly to differences in search engines and handling of the search terms. While there may be additional datasets in proprietary databases, e.g., classified industry-generated data, or published but not yet submitted, the two repositories are reasonably comprehensive and abundant; we find searching for the additional data is usually frustrating, timeconsuming, and unproductive. ArrayExpress allows specifying human studies only. Those ArrayExpress experiments that are also found in GEO have designations E-GEOD, e.g., E-GEOD-10121. ArrayExpress also allows

Fig. 2 Screen shots of GEO (a) and ArrayExpress sites (b) searches for relevant transcription profiling experiments. The search terms are highlighted in yellow in the ArrayExpress screen

Analysis and Meta-analysis of Transcriptional Profiling in Human Epidermis

specifying only experiments that do not overlap those in GEO, which have different designations e.g., E-MATB-482; we find this somewhat risky, differences in search protocols may catch some of the GEO experiments not flagged by the GEO search engine, and sorting through redundant GSE- and E-GEOD- experiments is an easy task. The flagged experiments have to be parsed individually because they include many that are not directly appropriate for the study at hand (e.g., “MicroRNA profiling by array of NCI-60 human cancer cell-lines” or arrays containing only cell lines, arrays dealing with peripheral blood cell samples, and tumor-associated fibroblasts). We also remove studies that do not contain mRNA expression profiling (i.e., those analyzing DNA methylation, SNPs, gene copy number, exon, and miRNA arrays); also left out are studies that use small proprietary microarrays, such as those with less than 10 K genes probed, and, currently, studies using RNA sequencing. Usually, approximately half of the studies use Affymetrix platforms, the remainder use Agilent or, Illumina; less common are Hitachi, Sentrix, GE Healthcare, or proprietary arrays. We will illustrate the search using three examples, EGFR inhibitors, psoriasis, and retinoic acid. 1. To identify transcriptional studies in which EGFR inhibitor was used, two repositories were searched: PubMed GEO and EMBI-EBI ArrayExpress. Several combinations of search terms were used: “EGFR & inhibitor,” “EGFR & (Lapatinib | Gefitinib | Erlotinib | Cetuximab | Panitumumab | Zalutumumab | Nimotuzumab | Matuzumab | AG1478 | inhibitor),” the following terms were not found in GEO DataSets: Panitumumab, Zalutumumab, Matuzumab. We limited the search to human samples, i.e., studies that used human microarrays. Ultimately, in GEO, the search terms “EGFR & inhibit*” flagged 191 items, including 61 series. Setting the limits in the search to “expression profiling by array,” “genome tiling” or “sequencing” yielded 50 sets, while eliminating the SNP, promoter arrays, and ChIP studies. Searching Array Express with “EGFR and Inhibit*” yielded 44 sets; with a few exceptions these had equivalents in the GEO Datasets. The GEO and ArrayExpress searches combined identified 67 different data sets, which were individually screened for comparing directly EGFR inhibitor treated vs. untreated samples. Of the 67 sets 22 compared directly EGFR inhibitor treated with untreated samples; 14 sets compared EGFR inhibitor-resistant vs. sensitive cell lines; in six studies we found both inhibitor-treated vs. untreated comparisons and inhibitor-resistant vs. sensitive cell lines. In several studies multiple cell lines or tissues were treated, these should be analyzed independently, i.e., each cell line analyzed separately. One data set, GSE6128, compared long term

Claudia Mimoso et al.

Selected genes p

Analysis and meta-analysis of transcriptional profiling in human epidermis.

Because of its accessibility, skin has been among the first organs analyzed using DNA microarrays; psoriasis, melanomas, carcinomas, chronic wounds, a...
2MB Sizes 0 Downloads 0 Views