Article pubs.acs.org/jpr

Molecular Portrait of Breast-Cancer-Derived Cell Lines Reveals Poor Similarity with Tumors Paolo Cifani,† Ufuk Kirik, Sofia Waldemarson, and Peter James* Department of Immunotechnology, CREATE Health, Lund University, House 406, Medicon Village, 223 81 Lund, Sweden S Supporting Information *

ABSTRACT: Breast-cancer-derived cell lines are an important sample source for cancer proteomics and can be classified on the basis of transcriptomic analysis into subgroups corresponding to the molecular subtypes observed in mammary tumors. This study describes a tridimensional fractionation method that allows high sequence coverage and proteome-wide estimation of protein expression levels. This workflow has been used to conduct an in-depth quantitative proteomic survey of five breast cancer cell lines matching all major cancer subgroups and shows that despite their different classification, these cell lines display a very high level of similarity. A proteome-wide comparison with the RNA levels observed in the same samples showed very little to no correlation. Finally, we demonstrate that the proteomes of in vitro models of breast cancer display surprisingly little overlap with those of clinical samples. KEYWORDS: breast cancer, cell lines, classification, SILAC, RNA−protein correlation, biomarkers, transcriptomics, sample fractionation



Despite presenting several drawbacks,6 immortalized breast cancer cell lines (BCCLs)7 are a very popular model system among scientists because they provide a relatively large amount of homogeneous sample while partly retaining the phenotypic characters of the tissue they originated from.8,9 Many BCCLs have been classified according to the scheme above, and in this study we focused on the MCF-7 (luminal A-like10) and MDAMB-231 (basal-like11) lines, which (being established over three decades ago) account for about 70% of BCCL-related literature. In addition, we included the SK-BR-3 (luminal Blike12), MCF-10A (nontumorigenic13), and JIMT-1 (ErbB2like14) lines. These cell lines are representative of all breast tumor subtypes defined above and are routinely used in many laboratories, and both their genomes and transcriptomes have been analyzed.15 In general terms, cell lines constitute an attractive model system for mass-spectrometry-based proteomics16 because they can alleviate the constraint of sample size, which often prevents extensive sample fractionation, and provide homogeneous cell populations, thus reducing sample complexity compared to that of clinical samples. Moreover, cells grown in vitro can be metabolically labeled with stable isotopes to provide a robust and reliable quantitation strategy, SILAC,5,17 which in fact was applied in some of the most comprehensive proteomics studies reported to this date.18−20 Current proteomics pipelines are

INTRODUCTION

Breast cancer is one of the most commonly diagnosed malignancies and a leading cause of death in women,1 and as such has been the subject of considerable research efforts over the past decades. Heterogeneity is a hallmark of mammary tumors, a feature that directed the focus of research toward the definition of molecular classification schemes capable of distinguishing between different cancer types and accordingly drive prognosis and treatment choice. However, only few molecular markers (such as the estrogen receptor (ER), the progesterone receptor (PR), and the receptor kinase HER2)1,2 have so far been adopted for breast cancer stratification, which at present mostly relies on gross clinical parameters such as tumor size, presence of malignant cells in the lymph nodes, histological grade, and number of distal metastases. The development of high-throughput transcriptomic platforms provided the means for surveying the expression of thousands of genes and enabled classification studies to shift their focus from anatomical and histological features to the intrinsic molecular composition of each tumor. As a result, several classification schemes have been proposed over the past 15 years, each of them based on a specific panel of RNA markers whose relative abundance is reported to discriminate between different cancer subtypes. In this study, we focus on the works by Perou,3 Sørlie4 and Hu,5 who essentially divided breast tumors into basal, luminal (A and B), HER2-positive, and normal-like groups. © 2015 American Chemical Society

Received: February 1, 2015 Published: June 9, 2015 2819

DOI: 10.1021/acs.jproteome.5b00375 J. Proteome Res. 2015, 14, 2819−2827

Article

Journal of Proteome Research

tumor cold. The tumor was powdered in a Teflon bomb cooled in liquid nitrogen by shaking with a steel ball. The sample was divided into two different tubes, one for RNA extraction and another for protein extraction.

narrowing the gap with transcriptomic platforms in terms of the number of analytes that can be identified and quantitated and are close to enabling the direct functional characterization of biological samples, a task that until recently has been mainly accomplished by RNA profiling under the assumption that protein concentration is directly determined by transcript abundance. We previously reported the lack of such correlation,21 but because most of the other published studies on this subject are based on yeast22 or on nonbreast-derived cell lines,23 there is an unmet need for additional and more comprehensive data that specifically investigates mammary tumor cells. Within the context of the hypothesis that RNA levels do not necessarily correlate with protein abundances, whether or not BCCLs are a faithful model of tumor cells in vivo is a fundamental issue that needs to be addressed. This work focuses on these problems with a systematic evaluation of the RNA−protein correlation and with the extensive comparison of the proteomes of clinical samples and of their in vitro models.



Protein Extraction and Sample Pooling

All chemicals except the protease inhibitor cocktail (Roche Gmbh, Mannheim, Germany) were from Sigma-Aldrich. Cell pellets were thawed on ice and incubated 30 min with 10 μL/ (106 cells) lysis buffer (0.1% DOC, 0.1% nOG, 0.1 M Tris− HCl at pH 7.5, 1 mM EDTA, 0.1 M NaCl, and 1× Complete Mini protease inhibitor). Cell pellets were dissolved and spun at max speed for 30 min at 4 °C on a benchtop centrifuge. Supernatants were collected in clean tubes, and protein concentration was determined by bicinchoninic acid assay (Sigma-Aldrich). The 2 mg samples of 13C6 “heavy” and 12C6 “light” proteins, respectively, were pooled, vortexed for 30 s, mixed with a 1:7 volume of 6× SDS-PAGE loading buffer (375 mM Tris−HCl at pH 6.8, 6% SDS, 48% glycerol, 0.5 M DTT, and 0.03% bromophenol blue), and heated for 5 min at 90 °C.

EXPERIMENTAL SECTION

SDS-PAGE and In-Gel Digestion

Cell Culture and SILAC Labeling

Samples consisting of 4 mg of each SILAC pool were separated on 12% Criterion gels (Bio-Rad Laboratories, Hercules, CA, USA), following manufacturer’s instructions. Gels were washed in Milli-Q water, stained for 1 h with GelCode (Bio-Rad Lab), and destained overnight in Milli-Q water, and each lane was then manually sliced into ten fractions. Gel slices were destained by washing three times in 25 mM AMBIC and 50% ACN and dried in a vacuum centrifuge (Speedvac). Samples were then reduced (10 mM DTT and 50 mM AMBIC for 1 h at 56 °C), alkylated (55 mM IAA and 50 mM AMBIC for 45 min at RT in the dark), washed 3 times with 50 mM AMBIC, and dehydrated with ACN before being dried in a Speedvac. Slices were rehydrated with 12.5 μg/mL modified porcine trypsin (Promega, Madison, WI, USA) in 50 mM AMBIC, and the digestion was performed overnight at 37 °C. Peptides were eluted in 75% ACN/1% FA, pooled with the corresponding fraction in the other lanes of the gel, dried in the Speedvac, and stored as a pellet at −20 °C until LC−MS analysis.

MDA-MB-231, MCF-7, SK-BR-3, JIMT-1, MCF-10A, CCD1129SK (human fibroblast), and adipose-derived mesenchymal stem cells were from ATCC (Wesel, Germany). All cell lines were cultivated in 162 cm2 flasks (Corning, NY, USA) at 37 °C and 5% CO2 in a humidified incubator. Cells were grown in SILAC RPMI medium (PAA, Linz, Austria) lacking L-arginine, L-lysine, and L-glutamine. Samples of 10% dialyzed FBS (GIBCO Invitrogen, USA) and 2 mM L-glutamine (GIBCO Invitrogen) were added to all media. Insulin (2.5 μg/mL, GIBCO, Invitrogen) was added to the MCF-7 and the MCF10A cell lines. Samples of 5 ng/mL EGF (GIBCO Invitrogen) and 500 ng/mL hydrocortisone (Sigma-Aldrich, Steinheim, Germany) were added to the medium for MCF-10A only. Samples of 0.798 mM L-lysine and 0.099 mM L-arginine were added to all media. Samples of (13C6)-L-lysine−2HCl and (13C6)-L-arginine−HCl (Cambridge Isotopes Laboratories, Cambridge, MA, USA) were added to the MDA-MB-231 medium, and (12C6)-L-lysine−2HCl and (12C6)-L-arginine−HCl (Sigma-Aldrich) were used for all other cell lines. After completing 6−8 cell divisions in a single flask, cells were expanded to the final required amount and harvested while in the exponential growth phase. Cells were washed with cold sterile PBS, detached with trypsin, and washed again twice in cold PBS, and the cell pellets were stored at −80 °C until protein extraction. The cultivation of mesenchymal stem cells and their differentiation into mature adipocytes was performed using an adipocyte differentiation kit (ATCC) according to the supplied protocols. No significant arginine-to-proline conversion was observed in the metabolically labeled cells.

SCX Fractionation

Peptide pellets were dissolved in 200 μL of 5 mM KH2PO4 buffer/25% ACN at pH 3 (buffer A) and loaded on a ICAT cation exchange cartridge (part number 4326752, Applied Biosystems). The flow-through was kept as fraction no. 1. A total of nine more fractions per gel slice were obtained by eluting the peptides with 200 μL (1 column volume) of buffer A containing 15, 30, 50, 75, 100, 150, 250, 500, and 1000 mM KCl, respectively. Volumes were reduced to about 10 μL in a Speedvac, and 0.1% FA was added to a final volume of 50 μL.

Patient and Tumor Material

Tissues were collected at the Lund University Hospital and made anonymous after informed consent was given and approval by the Ethics committee (reg nos. LU240−01 and 2009/658) was received. The data set was composed of primary tumors of different histological grade, recurrence or second primary tumors, and nonmalignant tissue14 from patients predisposed to breast cancer. The resected sample was stored at −80 °C. A pathologist examined all samples to obtain representative, viable, and non-necrotic tumor tissue. An appropriate piece of the tumor was excised while keeping the

RNA Extraction and Analysis

Total RNA was extracted from fresh cells using the RNeasy kit (Qiagen, Duesseldorf, Germany) according to the manufacturer’s protocol for cells grown in vitro. RNA samples were labeled for a two-color system analysis and hybridized on an Agilent Whole Genome (4 × 44K) microarray (Agilent Technologies, Santa Clara, CA, USA). RNA analysis was performed by GeneLogic (Gaithersburg, MD, USA) according to standard operating procedures. 2820

DOI: 10.1021/acs.jproteome.5b00375 J. Proteome Res. 2015, 14, 2819−2827

Article

Journal of Proteome Research LC−MS/MS

employed to translate UniProt and transcriptomic IDs into gene symbols. GO terms enrichment was performed using Gorilla26 for two unranked list of genes, with the specific subset of identifiers (shared between cell lines and tumors or specific for either of the two sample types) submitted as the target set and the total list of identified proteins used as the background set. Organism specificity was restricted to H. sapiens and ontology to GO processes; the p value threshold was set to 10−3 level. The software queried the GO database as of March 21, 2015.

Peptide pellets were resuspended in 8 μL of 0.1% FA/3% ACN in water and resolved using an Exigent nano-LC 2D system (Eksigent, Dublin, CA, USA) consisting of a solvent degasser, a nanoflow pump, and a cooled autosampler. A total of 5 μL of sample was loaded and washed for 15 min on the precolumn (Zorbax 300SB-C18 5 × 5 μm, Agilent Tech.) at a constant flow of 10 μL/min of solvent A (0.1% FA in water) to remove salt from the SCX elution. Peptides were then resolved on a 150 mm x 75 μm (ID) ZORBAX SB-C18 column (Agilent Tech). A segmented gradient was run from 5 to 10% solvent B (0.1% FA in ACN) over 5 min, then 10 to 50% over 3 h, and 50 to 80% over 10 min, and then the sample was washed with 80% solvent B for 15 more min at a constant flow rate of 350 nL/ min. The HPLC system was coupled to an LTQ Orbitrap XL mass spectrometer (ThermoFisher Scientific, Bremen, Germany) operated in DDA mode. Survey full-scan MS spectra (m/z 400−2000) were acquired in the Orbitrap with a resolution of 60 000 at m/z 400, accumulating ions for 30 ms. Spray voltage was set to 2 kV, and the temperature of the heated capillary was set to 180 °C. The ten most intense ions from the survey scan performed by the Orbitrap were fragmented by collision-induced dissociation (CID) in the LTQ (normalized collision energy of 35, parent mass selection window of 0.5 Da, activation time of 30 ms, and minimum signal threshold for MS/MS scans set to 500). Unassigned charge states and singly charged ions were excluded from fragmentation. The dynamic exclusion list was limited to a maximum of 500 masses with a maximum retention time of 2 min and a relative mass window of 10 ppm. The Xcalibur software (Thermo Scientific) controlled the HPLC, mass spectrometer, and data acquisition.



RESULTS

High Similarity Among BCCLs Revealed by Deep Proteome Coverage

Transcriptomic analysis of breast tumors has demonstrated that different subtypes can be distinguished on the basis of their specific RNA profiles and separated into five major groups.3−5 Interestingly, this classification was successfully applied to breast-cancer-derived cell lines as well, identifying in vitro models that display the differences at the RNA level observed among clinical samples. The first goal of this work was to compile a map of the proteins that are identifiable in BCCLs and to verify if the protein abundance profiles obtained by comprehensive quantitative proteomics were in agreement with the observations derived from RNA levels. To this end, we focused on five cell lines commonly used as model systems (on the basis of the literature metadata) that cover each of the five major subtypes described by Sörlie et al.4 To estimate the protein amount and improve the overall depth of proteome identification, we used isotopic protein labeling combined with extensive sample fractionation. Samples of MDA-MB-231, which had the fastest growth rate among the cell lines under investigation, wwere metabolically labeled with (13C6)-lysine and (13C6)-arginine (from now on referred to as “heavy”) and, after protein extraction, pooled pairwise each of the other four BCCLs. We devised a three-dimensional sample fractionation strategy combining the methods that showed the best orthogonality and overall performance in a previous study conducted in our laboratory.27 Briefly, each SILAC pool was first separated on the protein level by SDS-PAGE into ten fractions and then in-gel digested with trypsin. Peptides from each gel slab were further separated by SCX into ten fractions, giving a total of 100 fractions per pool that were finally resolved by reverse-phase LC in line with the mass spectrometer. Raw files were submitted to MaxQuant24 for protein identification and quantitation, providing every identified protein with an estimate of the abundance in each cell line and, based on the coefficient of variation of the four measurements in MDA-MB231, of the technical variability (Table S2 in the Supporting Information). While the presence of isotopologues increased sample complexity, virtually doubling the number of ions available for MS2 fragmentation, the alignment of chromatographic runs and precursor accurate mass measurements enabled the transfer of peptide spectral matches across different MS analyses. Repeated injection of the heavy-labeled sample28 and the limited specificity of off-line sample fragmentation increased the probability for low-abundance ions to be targeted for fragmentation, thus improving the overall proteome coverage. This workflow led to the identification of a total of 26 170 peptides (FDR 0.01, Table S1 in the Supporting Information) mapping to 3417 protein groups (FDR 0.01, Table S2 in the Supporting Information). A total of 3126

Data Analysis

Raw files from the mass spectrometer were submitted to MaxQuant version 1.2.0.1324 for protein identification and quantitation. Cysteine carbamidomethylation was set as fixed modification, and methionine oxidation and acetylation of the protein N-terminus were added as variables. The built-in Andromeda search engine was set to browse a locally stored UniProt human database (updated as of September 25th 2011) containing 37 068 sequences.32 FDR was set at 1% for both the peptide and the protein identification. Mass tolerance was set to 10 ppm for the precursor ion and 0.5 Da for fragment ions. Trypsin was set as the protease, and a maximum of two missed cleavages were allowed. “Match between runs” over 2 min and using an alignment time window of 20 min was enabled to improve the identification of the unfragmented precursor ions. Subsequent data handling was performed in Perseus (version 1.2.0.10) and Microsoft Excel. Proteins marked as contaminant, reverse hits, and “only identified by site”, as well as those with no computed intensity, were discarded. Because an equal amount of heavy-labeled protein was analyzed in each pool, intensities were normalized so that the sum of the values for MDA-MB-231 was the same in all SILAC pools. The intensity of proteins identified in MDA-MB-231 was defined as the average of the four normalized intensities in each pool, and the corresponding coefficient of variability (standard deviation and average) was computed. Pathway analysis was conducted using FEvER25 and MetaCore version 6.13 (Thomson Reuters, NY, USA). IPA Ingenuity (Ingenuity Systems Inc., Redwood, CA, USA) was 2821

DOI: 10.1021/acs.jproteome.5b00375 J. Proteome Res. 2015, 14, 2819−2827

Article

Journal of Proteome Research proteins (91.4%) were identified in both MDA-MB-231 and MCF-7, 2905 (85.0%) in SK-BR-3, 2714 (79.4%) in JIMT-1, and 3017 (88.2%) in MCF-10A. About 40% of all identified proteins were associated with cellular membranes, and GO enrichment for subcellular localizations showed that all major cell compartments were represented (32.4% of the identifications were assigned to the nucleus, 8.4% to the endoplasmic reticulum, 6.4% to the Golgi apparatus, 14.5% to mitochondrion, 22.9% to cytosol, and 15.4% to the plasma membrane), indicating no obvious bias in protein extraction and separation. A total of 2270 proteins (about 66.5% of the identifications) were detected in all five BCCLs, and a further 521 were present in at least four, indicating that most of the proteome was in fact shared by all of them (Figure 1). Thus, at least on a qualitative

Figure 2. Distribution of protein-relative quantitation. Relative quantitation is expressed as a Δ-fold change of the intensity relative to the average intensity over the five cell lines.

Jönsson and co-workers have analyzed genomic rearrangements in several BCCLs and described 76 genes affected by amplification or deletion in the cells that we characterized.15 A total of 29 of these genetic aberrations encode for proteins present in our data set, and to further validate our results, we focused on cell-line-specific gene deletions to verify if a loss of genetic information correlated with a lack of the corresponding product. Interestingly, we could match two of the gene deletions described by Borg’s group with the total absence of the corresponding protein; CDKN2A produced no signal in MCF-10A and CDH1 did the same in SK-BR-3 despite the proteins being present in other cell lines. The effects of gene amplifications on the levels of the encoded transcripts and proteins are harder to evaluate because regulative mechanisms may mask the effects of the genetic aberration. Nonetheless, out of the 24 proteins coded by the amplified genes described by Borg’s group and present in our database, 17 were in fact overexpressed in the cell line carrying the genomic defect, supporting the accuracy of our analysis. Low Correlation of Proteins and mRNA-Relative Expression Levels

Figure 1. Number of proteins identified over the entire data set and for each cell line. The fraction of cell-line-specific proteins are reported in blue. The fraction of proteins shared by all five BCCLs is given in gray, and that of protein identified in only one cell line (“cell-line specific identifications”) is given in red.

To evaluate RNA−protein correlation on a large scale, we extracted the total RNA from the five BCCLs and analyzed it with GeneLogic according to the SOPs for Agilent two-color profiling. The same experimental design used for the proteomic analysis was adopted with MDA-MB-231 used as a reference. RNA from the reference was labeled with Cy3, pooled systematically with Cy5-marked RNA from each of the other four BCCLs, and hybridized on an Agilent Human GE 4 × 44K v2 Microarray (Table S4 in the Supporting Information). RNA profiling of every pool was performed a total of four times, including label swapping, and the data was normalized as described for the proteomics analysis. To compare proteomic and transcriptomic data, we translated all of the IDs from both data sets into gene symbols using IPA Ingenuity, and only the 3326 proteins with a counterpart in the RNA experiment were kept. Similarly, proteins with missing values (i.e., without an intensity recorded in one or more of the five cell lines) were discarded, limiting our comparison to the 2216 IDs quantified in every cell line and with both platforms. A comparison of the signal intensities from the transcriptomic and proteomic platforms with very different dynamic ranges (typically 3 and 7 logs, respectively) was enabled, and we calculated the

plane, our data confirm the observations about the “core proteome” reported by Geiger et al.19 as expected from cell lines with the same anatomical origin. Only a very few of the identified proteins were specific for one cell line, and none were detected exclusively in MDA-MB-231 or MCF-7 (Table S3 in the Supporting Information). Precursors assigned to all BCCLspecific proteins had in fact low absolute intensity (MS1), suggesting that these ions were possibly present in the other samples as well but failed to be detected, either because their concentrations fell below the detection limit or due to MS undersampling. A closer analysis of protein quantitative data confirmed the high degree of similarity between all BCCLs. Most identified proteins had, in fact, intensities consisting of values within ±1 or ±2 fold-changes (65.1% and 75.5%, respectively, in MDAMB-231) of the average calculation over the five BCCLs (Figure 2). 2822

DOI: 10.1021/acs.jproteome.5b00375 J. Proteome Res. 2015, 14, 2819−2827

Article

Journal of Proteome Research N normalized values (Vn) as INRNA = (I(i) RNA − IR̅ NA)/(IR̅ NA) and IProt (i) = (IProt − IP̅ rot)/(IP̅ rot), respectively (where “Prot” stands for “proteins”, I(i) is the intensity of an analyte in one cell line, and I ̅ is the average intensity for that analyte calculated over the five cell lines). This transformation assigned to both VnRNA and VnPROT a relative quantitation value in the same −1 to +4 range (Figure 3 and Table S5 in the Supporting Information), allowing their direct comparison.

panel was encompassed by our proteomic data set,, including 23% (72/391) of the classifiers reported by Perou (and annotated with gene symbols), 18% of those from Sørlie’s work, and 30% of those from Hu’s. Again the RNA and protein relative intensities were compared, restricting the analysis to the IDs included in the panels above with no significant increase in correlation. The r values were again close to 0, with their average over the five comparisons for each panel being 0.088, 0.135, and 0.088 when only considering the classifiers defined by Perou, Sørlie, and Hu, respectively. To try and reproduce the workflow of the cited studies, we also performed a hierarchical clustering based on either proteomic or transcriptomic data. The RNA-based clustering reproduced the classification described in the original studies, clearly separating luminal and normal-like cell lines from the highly proliferating HER+-like JIMT-1 and the basal-like MDA-MB-231. Within the cluster of BCCLs representing low-malignancy tumor types, the two luminal-like cell lines (MCF-7 and SK-BR-3) showed the highest similarity (Figure 4b). However, the proteomic analysis yielded a completely different clustering (Figure 4a), demonstrating that the two platforms highlighted different patterns of similarity between the cell lines. We also reasoned that the differences between cell lines could go undetected by focusing on single identifiers but might become more evident by adopting a more system-level approach. To assess our results on a functional level, we then extended our analysis to the pathways level using a novel inhouse-developed tool, FEvER.25 Briefly, for each of the five cell lines we performed a statistical evaluation to retrieve functionally related proteins or mRNA showing significantly different abundance in one cell line compared to that of the others (Table S6a−e in the Supporting Information). To this end, we analyzed the normalized RNA and protein relative quantifications, and we compared the FEvER scores calculated in each cell line from the transcriptomic and the proteomic data set. FEvER software evaluates the extent of differential regulation on the pathway level and returns probability-type scores to report the statistical significance for the differential regulation for each pathway.25 The Pearson’s correlation (r) value of the acquired scores for all pathways were 0.26 for MCF-7, 0.23 for MDA-MB23, 0.16 for JIMT-1, 0.38 for MCF-10A, and 0.22 for SK-BR-3 (Figure 5), indicating once again a lack of correlation between RNA and protein results even at the pathway level. BCCLs Proteome Not Representative of That of Clinical Samples

The term “cancer cell line” is ubiquitous in literature and implies that tumor-derived immortalized cultures retain typical features of the tissues they originated from. Having established a comprehensive atlas of the BCCLs proteome, we began to assess the degree of similarity between these samples and the tumors tissues that they are supposed to represent. We have recently analyzed 450 tumor samples that encompass all breast cancer subtypes listed above (see PRIDE data set accession in the Supporting Information). SILAC heavy-labeled MDA-MB231 was adopted as a standard to achieve relative protein quantitation, thus providing a direct link between the two data sets. For practical reasons, entities in the proteomic data set consisting of multiple proteins were reduced to the main UniProt identifier, which could have led to a slight underestimation of the overlap between the two data sets. In the tumor samples we identified 3681 proteins in total (i.e., proteins with a measured intensity for the “light” SILAC

Figure 3. Protein−RNA level correlation graphs. Correlation of the transcriptomic (x-axis) and proteomic (y-axis) relative expression levels: (a) MDA-MB-231, (b) MCF-10A, (c) JIMT-1, (d) MCF-7, and (e) SK-BR-3.

Pearson’s correlation (r) values were nearly 0: −0.28 for MDA-MB-231, −0.01 for MCF-10A, −0.13 for MCF-7, −0.12 for JIMT-1, and −0.08 for SK-BR-3. Thus, in our experimental setting the relative concentration of each protein appeared to be essentially independent of the level of the corresponding RNA. Because the choice of the five cell lines covered by our study was driven by transcriptomic-derived classification schemes, we mined our data set to retrieve the classifiers reported by Perou,3 Sørlie,4 and Hu5 (gene names for the classifiers were retrieved from Mackay et al.29). Unfortunately, only a subset of each 2823

DOI: 10.1021/acs.jproteome.5b00375 J. Proteome Res. 2015, 14, 2819−2827

Article

Journal of Proteome Research

Figure 4. Hierarchical clustering of BCCLs based on (a) proteomic and (b) transcriptomic quantitative data. The MDA-MB-231 BCCL samples are classified as basal-like, MCF-10A as nontumorigenic, JIMT-1 as HER+, MCF-7 as luminal A-like, and SK-BR-3 as luminal B-like.

identifications. This analysis revealed in tumor samples a statistically significant (p < 0.001) enrichment for terms related to hemostasis, response to wounding, immune response, and intercellular trafficking and signaling. On the contrary, among the proteins identified exclusively in BCCLs we observed an enrichment for enzymes involved in metabolic processes, intracellular transport and signaling, and cell duplication (Tables S7 and S8 in the Supporting Information). As expected, no significant enrichment was observed among proteins detected in both the cell lines and the tumors. Nonetheless, many of the proteins only identified in cell lines are essential for cell survival and therefore are likely to be expressed even in tumor cells in vivo. Failure to detect this class of proteins was most probably caused by the greater complexity of clinical samples compared to that of cells cultivated in vitro. In fact, a tumor biopsy is a collection of different proteomes originating not only from the cancer itself but also from its stroma, the immune system, other blood cells, and so on. However, the homogeneity of cells cultivated in vitro can be explained by the fact that they all share essentially the same genome, receive the same stimuli from the culture environment, and function for accomplishing the same basic “housekeeping” tasks.

version), a number comparable with the one we obtained from cell lines (3417). However, a mere 790 (11.1%) of the combined identifications (7098) were also detected in at least one of the BCCLs we analyzed, while 2891 were identified in tumors only and 2627 identified exclusively in cell lines (Figure 6). An overlap of about 1/10th of the total made us exclude any extensive similarity between the proteomes of the BCCLs and those of the tumors they are supposed to represent. We speculated that the differences between the two data sets could be explained by the contribution of proteins from the tumor stroma, which was not represented by the characterized BCCL. To test this hypothesis, we included in our analysis fibroblasts (CCD-1129SK, a cell line derived from human breast fibroblasts) and adipocytes obtained by in vitro differentiation of human adipose-derived mesenchymal stem cells. Both cell lines were independently pooled with heavy-labeled MDA-MB231 as described above, fractionated by SDS-PAGE (15 slices), and, after in-gel digestion, analyzed with the same MS workflow adopted for the four BCCLs pools. The resulting MS data were submitted to MaxQuant together with the data from the five BCCLs. This expanded data set included a further 297 proteins identified in cell lines (a total of 3714), but none of these additional species were in fact shared with the clinical tumor data set (Figure 6). FEvER functional analysis of the two data sets (all cell lines versus all clinical samples) partly explained the discrepancy between the two groups, showing that cells in vivo were engaged in different processes compared to those grown in vitro. For example, many pathways related to cell adhesion and hormone signaling had in fact a better coverage in the tumor samples than in cultivated cells. In contrast, the analysis of cell lines produced a better coverage of intracellular processes such as cell cycle regulation, housekeeping metabolism, and signal transduction (Table S7a−c in the Supporting Information). This observation was further supported by a GO terms enrichment using Gorilla26 that enabled us to highlight the processes specifically enriched in either the cell lines or the tumor-specific proteins compared to those in the aggregate



CONCLUSIONS The proteomes of five commonly used human BCCLs were characterized to a depth of over 3000 proteins and matched to their respective transcriptomes, revealing a substantial lack of correlation between the two data sets. This result is in agreement with data previously shown21,22 and further indicates that cellular RNA levels do not reliably predict the amount of corresponding proteins, probably because of different half-lives and turnover rates of RNAs and their corresponding proteins30 or of translation regulatory mechanisms. The apparent discrepancy with previous studies showing some degree of correlation23 may depend on the specific cell lines used here or originate from the different analytical strategies used. It follows that molecular classifiers derived from RNA cannot be directly 2824

DOI: 10.1021/acs.jproteome.5b00375 J. Proteome Res. 2015, 14, 2819−2827

Article

Journal of Proteome Research

Figure 6. Overlap of protein identifications from BCCLs and human tumors. The number of proteins identified in human tumors (yellow) and in the cells lines (blue) is represented as Venn diagram. For the cell lines, both the data relative to the BCCLs only (dark blue) and to all breast cell lines (pale blue) are reported.

than that of cell lines grown in vitro. All of the BCCLs we analyzed seems to express a similar essential proteome that was mainly constituted by essential housekeeping proteins. This “core proteome” encompasses a large portion of the currently detectable proteins, in agreement with the results recently published by Geiger and co-workers.19 This notion is strengthened by our pathway analysis, which showed that all five BCCLs analyzed were indeed engaged essentially in similar cellular processes. If we accept that cell lines only express what could be referred to as a “minimum standard proteome”, their suitability for proteomic studies must be re-evaluated to address the obvious limitation arising. Although cells in vitro can provide a valuable model by which to dissect basic cellular processes that would otherwise be still undetectable, they look unsuitable for studies aimed at cancer classification or biomarker discovery.



ASSOCIATED CONTENT

S Supporting Information *

Supplementary Tables 1 and 2. Identified peptides. Supplementary Table 3. Cell line-specific proteins. Supplementary Table 4. RNA quantitative data set. Supplementary Table 5. Comparison of relative protein and mRNA levels for each of the BCCLs. Supplementary Table 6a−e. Functional profiling of the five BCCLs. Supplementary Table 7a−c. Functional profiling of the cell-line- and tumor- specific proteomes. Supplementary Table 8a,b. Processes significantly enriched in the cell lines and tumors. Supplementary Document 1. Description of terms and methods used. All MS data has been deposited at the EBI PRIDE database. The human tumor protein expression data is accessible under “BRCA protein subtype: Project Accession PXD000944”. The cell line protein data is available under “Human Breast Cell Index: Project Accession PXD000691”. The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.5b00375.

Figure 5. Functional analysis correlation graphs. The p values are calculated from FEvER scores and express the probability of a given pathway to be significantly differentially regulated on the basis of either RNA or protein intensities. Points marked in red designate theoretical pathways showing identical probability in both the proteomics and the transcriptomics data sets: (a) MDA-MB-231, (b) MCF-10A, (c) JIMT-1, (d) MCF-7, and (e) SK-BR-3.

translated into protein biomarkers. Furthermore, because proteins are the molecular actuators of most cellular processes, transcriptomic studies might not be suitable to infer functional information. Unlike what we previously described on ovarian tumors,31 unsupervised clustering of the proteomics data set showed no similarity with the classification obtained from RNA profiling, which in turn resembled the classifications proposed by Perou, Sørlie, and Hu based on similar analytical platforms. Consequently, although RNA levels are informative for classification purposes, their use to predict protein abundance and thus infer functional data should be avoided. Additionally, we showed that the proteome of clinical samples is very different, or at least substantially more complex,



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Telephone: +46 46 222 1496. Present Address †

Department of Molecular Pharmacology and Chemistry, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA.

2825

DOI: 10.1021/acs.jproteome.5b00375 J. Proteome Res. 2015, 14, 2819−2827

Article

Journal of Proteome Research Author Contributions

(11) Cailleau, R.; Olive, M.; Cruciger, A. V. J. Long-term human breast carcinoma cell lines of metastatic origin: preliminary characterization. In Vitro 1978, 14, 911−915. (12) Engel, L. W.; Young, N. A. Human breast carcinoma cells in continuous culture: a review. Cancer Res. 1978, 38, 4327−4339. (13) Soule, H. D.; Maloney, T. M.; Wolman, S. R.; Peterson, W. D.; Brenz, R.; McGrath, C. M.; Russo, J.; Pauley, R. J.; Jones, R. F.; Brooks, S. C. Isolation and characterization of a spontaneously immortalized human breast epithelial cell line, MCF-10. Cancer Res. 1990, 50, 6075−6086. (14) Tanner, M.; Kapanen, A. I.; Junttila, T.; Raheem, O.; Grenman, S.; Elo, J.; Elenius, K.; Isola, J. Characterization of a novel cell line established from a patient with Herceptin-resistant breast cancer. Mol. Cancer Ther. 2004, 3, 1585−1592. (15) Jönsson, G.; Staaf, J.; Olsson, E.; Heidenblad, M.; VallonChristersson, J.; Osoegawa, K.; de Jong, P.; Oredsson, S.; Ringnér, M.; Höglund, M.; Borg, Å High-resolution genomic profiles of breast cancer cell lines assessed by tiling BAC array comparative genomic hybridization. Genes, Chromosomes Cancer 2007, 46, 543−558. (16) Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422, 198−207. (17) Graumann, J.; Hubner, N. C.; Kim, J. B.; Ko, K.; Moser, M.; Kumar, C.; Cox, J.; Scholer, H.; Mann, M. Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC) and Proteome Quantitation of Mouse Embryonic Stem Cells to a Depth of 5111 Proteins. Mol. Cell. Proteomics 2007, 7, 672−683. (18) Beck, M.; Schmidt, A.; Malmstroem, J.; Claassen, M.; Ori, A.; Szymborska, A.; Herzog, F.; Rinner, O.; Ellenberg, J.; Aebersold, R. The quantitative proteome of a human cell line. Mol. Syst. Biol. 2011, 7, 549. (19) Geiger, T.; Wehner, A.; Schaab, C.; Cox, J.; Mann, M. Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol. Cell Proteomics 2012, 11, M111.014050. (20) Geiger, T.; Madden, S. F.; Gallagher, W. M.; Cox, J.; Mann, M. Proteomic portrait of human breast cancer progression identifies novel prognostic markers. Cancer Res. 2012, 72, 2428−2439. (21) Cifani, P.; Bendz, M.; Wårell, K.; Hansson, K.; Levander, F.; Sandin, M.; Krogh, M.; Ovenberger, M.; Fredlund, E.; Vaapil, M.; Pietras, A.; Påhlman, S.; James, P. Hunting for protein markers of hypoxia by combining plasma membrane enrichment with a new approach to membrane protein analysis. J. Proteome Res. 2011, 10, 1645−1656. (22) Maier, T.; Güell, M.; Serrano, L. Correlation of mRNA and protein in complex biological samples. FEBS Lett. 2009, 583, 3966− 3973. (23) Gry, M.; Rimini, R.; Strömberg, S.; Asplund, A.; Pontén, F.; Uhlen, M.; Nilsson, P. Correlations between RNA and protein expression profiles in 23 human cell lines. BMC Genomics 2009, 10, 365. (24) Cox, J.; Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008, 26, 1367−1372. (25) Kirik, U.; Cifani, P.; Albrekt, A.-S.; Lindstedt, M.; Heyden, A.; Levander, F. Multimodel pathway enrichment methods for functional evaluation of expression regulation. J. Proteome Res. 2012, 11, 2955− 2967. (26) Eden, E.; Navon, R.; Steinfeld, I.; Lipson, D.; Yakhini, Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinf. 2009, 10, 48. (27) Antberg, L.; Cifani, P.; Sandin, M.; Levander, F.; James, P. Critical comparison of multidimensional separation methods for increasing protein expression coverage. J. Proteome Res. 2012, 11, 2644−2652. (28) Liu, H.; Sadygov, R. G.; Yates, J. R. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 2004, 76, 4193−4201. (29) Mackay, A.; Weigelt, B.; Grigoriadis, A.; Kreike, B.; Natrajan, R.; A’Hern, R.; Tan, D. S. P.; Dowsett, M.; Ashworth, A.; Reis-Filho, J. S.

The manuscript was written through the contributions of all authors. Funding

This work was supported by grants from the Knut and Alice Wallenberg Foundation, the Swedish Research Council (Vetenskapsrådet), and the Swedish Strategic Research Council and Vinnova to CREATE Health. Notes

The authors declare no competing financial interest.



ABBREVIATIONS ACN, acetonitrile; AMBIC, ammonium bicarbonate; BCCL, breast cancer cell line; CL, cell line; DDA, data-dependent acquisition; DOC, deoxycholate; DTT, dithiothreitol; EDTA, ethylenediaminetetraacetic acid; FA, formic acid; FBS, fetal bovine serum; IAA, iodoacetamide; LC, liquid chromatography; MS, mass spectrometer; nOG, n-octyl glucoside; RT, room temperature (22 °C); SCX, strong cation exchange; SILAC, stable isotope labeling of amino acids in cell culture; TFA, trifluoracetic acid; Tris, tris(hydroxymethyl)aminomethane



REFERENCES

(1) DeSantis, C.; Siegel, R.; Bandi, P.; Jemal, A. Breast cancer statistics, 2011. Ca−Cancer J. Clin. 2011, 61, 409−418. (2) Reis-Filho, J. S.; Pusztai, L. Gene expression profiling in breast cancer: classification, prognostication, and prediction. Lancet 2011, 378, 1812−1823. (3) Perou, C. M.; Sørlie, T.; Eisen, M. B.; van de Rijn, M.; Jeffrey, S. S.; Rees, C. A.; Pollack, J. R.; Ross, D. T.; Johnsen, H.; Akslen, L. A.; Fluge, O.; Pergamenschikov, A.; Williams, C.; Zhu, S. X.; Lønning, P. E.; Børresen-Dale, A. L.; Brown, P. O.; Botstein, D. Molecular portraits of human breast tumours. Nature 2000, 406, 747−752. (4) Sorlie, T.; Tibshirani, R.; Parker, J.; Hastie, T.; Marron, J. S.; Nobel, A.; Deng, S.; Johnsen, H.; Pesich, R.; Geisler, S.; Demeter, J.; Perou, C. M.; Lønning, P. E.; Brown, P. O.; Børresen-Dale, A.-L.; Botstein, D. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 8418−8423. (5) Hu, Z.; Fan, C.; Oh, D. S.; Marron, J. S.; He, X.; Qaqish, B. F.; Livasy, C.; Carey, L. A.; Reynolds, E.; Dressler, L.; Nobel, A.; Parker, J.; Ewend, M. G.; Sawyer, L. R.; Wu, J.; Liu, Y.; Nanda, R.; Tretiakova, M.; Ruiz Orrico, A.; Dreher, D.; Palazzo, J. P.; Perreard, L.; Nelson, E.; Mone, M.; Hansen, H.; Mullins, M.; Quackenbush, J. F.; Ellis, M. J.; Olopade, O. I.; Bernard, P. S.; et al. The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics 2006, 7, 96. (6) Burdall, S.; Hanby, A.; Lansdown, M. Breast cancer cell lines: friend or foe? Breast Cancer 2003, 5 (2), 89−95. (7) Neve, R. M.; Chin, K.; Fridlyand, J.; Yeh, J.; Baehner, F. L.; Fevr, T.; Clark, L.; Bayani, N.; Coppe, J.-P.; Tong, F.; Speed, T.; Spellman, P. T.; DeVries, S.; Lapuk, A.; Wang, N. J.; Kuo, W.-L.; Stilwell, J. L.; Pinkel, D.; Albertson, D. G.; Waldman, F. M.; McCormick, F.; Dickson, R. B.; Johnson, M. D.; Lippman, M.; Ethier, S.; Gazdar, A.; Gray, J. W. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 2006, 10, 515−527. (8) Lacroix, M.; Leclercq, G. Relevance of breast cancer cell lines as models for breast tumours: an update. Breast Cancer Res. Treat. 2004, 83, 249−289. (9) Master, J. R. W. Human cancer cell lines: fact and fantasy. Nat. Rev. 2000, 1, 233−236. (10) Soule, H. D.; Vazguez, J.; Long, A.; Albert, S.; Brennan, M. A human cell line from a pleural effusion derived from a breast carcinoma. J. Natl. Cancer Inst. 1973, 51, 1409−1416. 2826

DOI: 10.1021/acs.jproteome.5b00375 J. Proteome Res. 2015, 14, 2819−2827

Article

Journal of Proteome Research Microarray-based class discovery for molecular classification of breast cancer: analysis of interobserver agreement. J. Natl. Cancer Inst. 2011, 103, 662−673. (30) Schwanhäusser, B.; Busse, D.; Li, N.; Dittmar, G.; Schuchhardt, J.; Wolf, J.; Chen, W.; Selbach, M. Global quantification of mammalian gene expression control. Nature 2011, 473, 337−342. (31) Waldemarson, S.; Krogh, M.; Alaiya, A.; Kirik, U.; Schedvins, K.; Auer, G.; Hansson, K. M.; Ossola, R.; Aebersold, R.; Lee, H.; Malmström, J.; James, P. Protein expression changes in ovarian cancer during the transition from benign to malignant. J. Proteome Res. 2012, 11, 2876−2889. (32) Cox, J.; Neuhauser, N.; Michalski, A.; Scheltema, R. A.; Olsen, J. V.; Mann, M. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 2011, 10, 1794−1805.

2827

DOI: 10.1021/acs.jproteome.5b00375 J. Proteome Res. 2015, 14, 2819−2827

Molecular Portrait of Breast-Cancer-Derived Cell Lines Reveals Poor Similarity with Tumors.

Breast-cancer-derived cell lines are an important sample source for cancer proteomics and can be classified on the basis of transcriptomic analysis in...
2MB Sizes 0 Downloads 5 Views