AEM Accepted Manuscript Posted Online 10 April 2015 Appl. Environ. Microbiol. doi:10.1128/AEM.00402-15 Copyright © 2015, American Society for Microbiology. All Rights Reserved.
1
Title: Validation of reference genes for transcriptional analyses in Pleurotus ostreatus using RT-qPCR.
2 3
Running title: A reference gene set for P. ostreatus RT-qPCR analyses
4 5
Authors: Raúl Castanera1&, Leticia López-Varas1&, Antonio G. Pisabarro1, Lucía Ramírez1#
6
Affiliations:
7
1
8 9 10 11
Genetics and Microbiology Research Group, Department of Agrarian Production, Public University of Navarre, Pamplona, Navarre, Spain
&
Equally contributing authors
# Corresponding author E-mail:
[email protected] 12
ABSTRACT
13
Recently, the lignin-degrading basidiomycete Pleurotus ostreatus has become a widely used model for
14
fungal genomics and transcriptomics. The increasing interest in this species has led to an increasing
15
number of studies analyzing the transcriptional regulation of multigene families that encode
16
extracellular enzymes. Reverse transcription followed by real-time PCR is the most suitable technique
17
for analyzing the expression of gene sets under multiple culture conditions. In this work, we have tested
18
the suitability of 13 candidate genes for their use as reference genes in P. ostreatus time-course cultures
19
for enzyme production. We have applied three different statistical algorithms and obtained a
20
combination of stable reference genes for an optimal normalization of RT-qPCR assays. This reference
21
index can be used for future transcriptomic analyses and validations of RNA-seq or microarray data.
22
Moreover, we have analyzed the expression patterns of a laccase and a manganese peroxidase (lacc10
23
and mnp3, respectively) in lignocellulose and glucose-based media using submerged, semi-solid and
24
solid-state fermentation. By testing different normalization strategies, we demonstrate that the use of
25
non-validated reference genes as internal controls leads to biased results and misinterpretations of the
26
biological responses underlying expression changes
27 28
Key words: RT-qPCR, Reference Gene, Pleurotus ostreatus, Laccase, Manganese peroxidase,
29
Transcription
30
2
31
INTRODUCTION
32
The basidiomycete Pleurotus ostreatus is an efficient producer of extracellular enzymes such as
33
laccases (Lac; EC 1.10.3.2) and manganese peroxidases (MnP; EC 1.11.1.13), which are notable for
34
their application in multiple industrial and biotechnological processes (1, 2). It has a special relevance
35
in agroindustry due to its world-wide cultivation for mushroom production, and it has also been studied
36
for its nutritional and medicinal value (3). Currently, two genomic sequences of P. ostreatus are
37
publicly available, and it has become a very popular model for the study of fungal genetics and
38
comparative genomics (4-6). Additionally, many studies have recently been published on the
39
production of ligninolytic enzymes, with a focus on culture optimization for enzyme production. Most
40
of these enzymes are encoded by genes organized in gene families, occasionally arranged into clusters
41
as a result of gene duplications. Despite the information uncovered by sequencing efforts, there is still
42
substantial work to be performed on the functional characterization of gene family members. In this
43
sense, transcriptomic analyses are powerful tools to understand gene regulation and its presumptive
44
function. Despite the high productivity and low cost of RNA-seq, reverse transcription followed by
45
real-time PCR (RT-qPCR) is likely the best option for analyzing expression patterns of certain genes
46
under multiple conditions. Additionally, it is the most reliable technique for validating RNA-seq and
47
microarray data because of its specificity, reproducibility, and capacity for detecting and measuring tiny
48
amounts of nucleic acids in a wide quantification range (7). Since its first appearance in 1993 (8), RT-
49
qPCR has evolved into a well-established technology. Many studies have been performed to improve
50
RT-qPCR accuracy in terms of new quantification models (9), PCR efficiency calculations (10, 11) or
51
baseline determinations (12). Several types of software that implement these approaches are currently
52
available (13, 14). Within the two possible quantification strategies (relative and absolute), relative
53
quantification is a very common strategy for analyzing the expression levels of a target gene in multiple
54
conditions or samples. In this method of quantification, the mRNA levels of the target gene are
55
compared to the expression of a reference gene that passes through all the steps along with the target 3
56
gene. The use of an internal standard is essential to compensate for the variance introduced in every
57
step of the process, such as RNA quantification, reverse-transcription or pipetting (9, 15-17). The ideal
58
reference gene should have constant expression regardless of the sample, condition, cell or treatment
59
used (16, 18). Housekeeping genes such as 18S rRNA, β-actin, β-tubulin or glyceraldehyde 3-
60
phosphate dehydrogenase have been commonly used as reference genes. Unfortunately, there is an
61
overwhelming amount of studies showing that such genes are often unacceptable for RT-qPCR
62
normalization due to unstable expression (19, 20). Additionally, there is strong evidence that the use of
63
more than one reference gene is more reliable than the use of only one (7, 16, 18). The choice of
64
adequate internal standards is crucial to obtain a successful and unbiased determination of expression,
65
which is essential to analyze the expression stability of reference genes in validation assays. Several
66
methods have been proposed to analyze reference gene panels (16, 18, 20-22), and studies using these
67
methodologies in different species are becoming more frequent. In this work, we have tested the
68
suitability of 13 candidate genes for their use as reference genes in P. ostreatus using different culture
69
conditions and growing stages. We have applied three different statistical algorithms and obtained a
70
combination of reference genes that can be used for future P. ostreatus transcriptomic studies.
71
Moreover, we have analyzed the relative expression of laccase and manganese peroxidase genes
72
(lacc10 and mnp3, respectively) in lignocellulose and glucose-based media using different
73
normalization strategies. Our results demonstrate that the use of unstably expressed genes as internal
74
controls leads to biased data and misinterpretations of the biological responses underlying expression
75
changes.
76
4
77
MATHERIALS AND METHODS
78 79
Fungal strain
80
In this study, the dikaryotic strain N001 of Pleurotus ostreatus var. florida was used. The
81
genomic sequences of its monokaryotic protoclones PC15 and PC9 (23) were obtained from the JGI
82
genome portal (http://genome.jgi.doe.gov/).
83 84
Culture media and growing conditions
85
Mycelia were grown on Petri dishes with solid PDA medium (Scharlab, S.L, Barcelona, Spain)
86
and maintained in darkness at 24 °C. Inocula were prepared in Erlenmeyers containing 150 ml of SMY
87
medium (10 g of sucrose, 10 g of malt extract and 4 g of yeast extract per liter) and 4 mycelium pieces
88
of approximately 0.25 cm2. These cultures were maintained in darkness at 24 °C for 6 days in orbital
89
shaking (125 rpm) and homogenized in an omnimixer. 15 ml of the homogenized inocula were added
90
to the final media. Regarding the experimental conditions, three media were used: i) Submerged
91
Fermentation cultures (SmF) set on Erlenmeyers with 135 ml of SMY medium; ii) Semi-Solid State
92
Fermentation cultures (s-SSF) containing 10 g of milled wheat straw and 70 ml of distilled water set in
93
glass flasks with a surface to volume ratio of 1 cm-1 and iii) Solid-State Fermentation cultures (SSF) in
94
polypropylene plastic bags containing 15 g of poplar chips of approximately 200 x 50 mm and 22.5 ml
95
of distilled water. All cultures were inoculated with 15 ml of inoculum and incubated in darkness at 24
96
°C. SmF cultures were kept in orbital shaking (125 rpm), whereas s-SSF and SSF cultures were
97
incubated in static conditions.
98 99
Experimental conditions
100
To include different growth stages in our analyses, the sampling times were based on previous
101
studies analyzing variations observed during P. ostreatus growth (24). Three independent biological 5
102
replicates were sampled for each culture medium. In this sense, we sampled SmF cultures on days 1, 3,
103
7, 11 and 15; s-SSF cultures on days 3, 7, 11, 15 and 30; and SSF cultures on days 3, 7, 11, 15 and 30.
104 105
Nucleic acid extraction and real-time qPCR
106
Mycelia were harvested, frozen and ground in a sterile mortar with liquid nitrogen. Total RNA
107
was extracted from 200 mg of deep frozen tissue using Fungal RNA E.Z.N.A Kit (Omega Bio-Tek,
108
Norcross, GA, USA); its integrity was estimated by denaturing electrophoresis on 1% (w/v) agarose
109
gels. Nucleic acid concentrations were measured using a NanodropTM 2000 (Thermo Scientific,
110
Wilmington, DE, USA), and the purity of the total RNA was estimated by the 260/280 nm absorbance
111
ratio. Samples were DNase treated using 1 U of RQ1 DNase (Promega, Madison, WI, USA) per µg of
112
RNA. Subsequently, samples were purified using RNeasy MinElute Cleanup (Qiagen Iberia S.L.,
113
Madrid, Spain) and set to a final concentration of 400 ng/μl. Total RNA (800 ng per sample) was
114
reverse-transcribed into cDNA in a 20 μl volume using the iScript cDNA Synthesis kit (Bio-Rad,
115
Alcobendas, Spain). Reverse-transcription (RT) was carried out in a Thermal Cycler (MJ Research,
116
Inc.) and the protocol consisted of 5 min at 25°C, 30 min at 42°C and 5 min at 85°C.
117
RT-qPCR reactions were performed in a CFX96TM Real-Time System C1000TM (Bio-Rad
118
Laboratories, S.A.) using SYBR Green dye to detect product amplification. Primers shown in Table 1
119
were used for the amplification of the transcripts from the reference genes and the of lacc10 and mnp3
120
genes. Each reaction was set to a final volume of 20 µl containing 10 μl iQTM SYBR Green Supermix
121
(Bio-Rad Laboratories, S.A.), 2 μl of 3 μM stock forward and reverse primers (Table 1), 1 μl of a 1:20
122
dilution of RT product and 5 μl of sterile water. The amplification program consisted of 5 min at 95°C,
123
40 cycles of 15 s at 95 °C, 30 s at 63 °C and 15 s at 72 °C, followed by a final melting-curve with
124
increments of 0.5 °C every 5 s in a linear gradient of 65-95 °C. The specificity of each reaction was
125
confirmed by the inspection of melt curve profiles. Each reaction was performed in triplicate, and Non
126
Template Controls (NTCs) were included for each primer set. An experimentally validated Interplate 6
127
Calibrator (IPC) was used to compensate inter-plate variation. Crossing-point values (Cp) and RFUs
128
were recorded, and the latter were used to calculate amplification efficiencies by linear regression using
129
the LinReg program (25).
130 131
Reference gene panel
132
We selected 13 genes of different functional classes as candidates (Table 2). Our gene panel
133
contained i) housekeeping genes such as tubulin (tub), actin (actin1) or cytochrome-c, (cyt-c); ii) genes
134
chosen based on stable expression in several RNA-seq and qPCR experiments such as lipase (lip) or
135
methylthioadenosine phosphorylase (phos) (26, 27); and iii) the sar1 gene, described as the ideal
136
reference gene in the ascomycetes Aspergillus niger
137
candidates were classified according to their suitability as internal standards using the geNorm (18),
138
NormFinder (16) and BestKeeper (21) algorithms.
(28) and Trichoderma reesei (29). The 13
139 140
Data normalization and quantification strategy
141
Data pre-processing was carried out using GenEx software (http://www.multid.se) and included
142
interplate-calibration, efficiency correction, and normalization using different sets of reference genes
143
and the calculation of relative quantities (RQ, formula 1) as well as the relative expression ratios
144
(Expression, Formula 2). SmF samples harvested at day 1 were used as control samples in the latter
145
quantification.
146
Formula 1
147
Formula 2
148
where CpE is the efficiency corrected crossing point. ΔCpE in Formula 1 represents the difference
149
between the CpE of the target gene and CpE of the mean CpE of the reference genes. The expression
150
value of Formula 2 represents the classical ΔΔCt method (15).
RQ = 2 − ΔCp E
Expression = 2 − ( ΔΔ Cp E )
7
151
To analyze the impact of reference gene selection in the relative expression calculations, we
152
obtained the expression of the laccase lacc10 and the manganese-peroxidase mnp3 using different
153
normalization strategies: i) The mean expression of 12 genes, ii) the mean expression of the top three
154
reference genes from the ranking, iii) the expression of the most stable reference gene, iv) the
155
expression of the least stable reference gene, and v) the amount of total RNA (no reference genes).
156 157
Statistical analyses
158
Heatmap and hierarchical clustering were performed on autoscaled expression data using Wards'
159
algorithm and the Euclidean distance. A Mann Whitney test was used to uncover significant differences
160
in the relative expression ratios calculated using the different normalization strategies. All statistical
161
analyses were performed using GenEx.
162
8
163
RESULTS AND DISCUSSION
164 165
The expression level of the 13 genes was measured in each of the 15 experimental conditions
166
(see Materials and Methods). The mean crossing point values (Cp) ranged from 18 to 25 cycles (Fig 1).
167
The standard deviation among the 15 different conditions was clearly variable, ranging from 0.64
168
cycles in vsn to 1.66 cycles in lip. The gene chs showed the lowest expression (highest Cp value, 28.04
169
cycles), whereas actin1 showed the highest expression (lowest Cp value, 16.71 cycles). Other
170
housekeeping genes such as tub, cyt or gapdh1 also showed high levels of transcription in concordance
171
with the important roles that they play in most cellular processes.
172 173
Stability of reference genes: NormFinder, geNorm and Bestkeeper analyses.
174
The NormFinder algorithm identified sar1 as the most stable reference gene based on its lower
175
SDN (NormFinder standard deviation) in comparison to the other candidates (0.27 cycles). Interestingly,
176
the gene showing lower inter-sample variation in the exploratory analysis of Cp values (vsn gene, Fig
177
1) was ranked 10th, evidence that raw Cp inter-sample variation itself is not a reliable indicator of
178
expression stability. The accumulated standard deviation (Acc.SDN) was explored to determine the
179
optimal number of reference genes to be used. Our results showed that a reference index composed of
180
12 genes (sar1, phos, pep, lip, cyph, cyt, gapdh1, actin1, chs, vsn, pas and tub) would be the most
181
appropriate for data normalization according to its low Acc.SDN (0.23 cycles) (Fig 2B). Nevertheless,
182
the difference between this index and another composed of the best three reference genes (sar1, phos,
183
pep) was only 0.03 cycles. Thus, the use of three reference genes for normalization (0.26 Acc.SDN) is
184
reasonable and avoids the increased cost and the loss of efficiency of using 12 genes. To obtain more
185
robust results, data were analyzed again taking into account the intra and inter-group variations in gene
186
expression among the three culture mediums (Tables S1 and S2). Genes showing high inter-group
187
variation (cutoff SDN of 0.65 cycles) were discarded because including unstable genes in the analysis 9
188
would lead to a less reliable ranking of expression stability. After removing tub, vsn, pas and actin1,
189
NormFinder was run again, considering all of the samples to form one group (Fig 2C). The results
190
showed a very similar profile in comparison to the first run. The gene sar1 was again the best and was
191
classified with an SDn of 0.29 cycles, followed by phos and pep; cyc was the least stable. Nevertheless,
192
the analysis of accumulated SDn showed that in this instance, the ideal reference index was composed
193
of three genes, displaying an Acc.SDN of 0.24 cycles (Fig 2D). This deviation is useful for evaluating
194
the systematic error introduced when normalizing RT-PCR data. In our case, this value falls in the
195
range of variability that is often found between technical replicates. Thus, we can conclude that the
196
systematic error (Acc.SDN) introduced by our proposed reference index (average of sar1, phos and
197
pep) is admissible for an accurate expression analysis. To give more strength to the results, we analyzed
198
the same dataset with the geNorm and Bestkeeper programs. The geNorm tool identified the
199
combination of sar1 and pep as the most stable gene-pair (M value of 0.62), and cyc was again
200
identified as the least stable candidate (Fig 3). Because Bestkeeper can analyze up to ten genes, we
201
chose the top ten candidates from the NormFinder classification as input data (80% of them were also
202
within the top ten in geNorm ranking). According to the descriptive statistics released by Bestkeeper,
203
sar1 and phos displayed the lowest coefficient of variance (3.97% y 3.67%, respectively) and standard
204
deviation (0.80 cycles and 0.87 cycles, respectively) (Table 3). Following the authors´ considerations
205
(21), genes showing a standard deviation greater than one cycle were considered inconsistent and
206
removed from the analysis; the remainder were used in further analyses. The Bestkeeper algorithm
207
calculates the pairwise correlations between Cp values and the Bestkeeper Index (BI) based on the
208
geometric mean of stable genes (SD < 1). Thus, genes showing a higher correlation (r) and a significant
209
p value (p < 0.05) were considered the most stable. According to this, sar1 was the first to be classified
210
in the ranking (r = 0.94, p < 0.05). The high correlation between this gene and the BI undermines its
211
stable expression and thus its suitability as a reference gene.
212 10
213
Reference genes´ stability ranking
214
According to the four methodologies used, sar1 was the most stable candidate gene (Table 4).
215
The reference gene pep appeared at the top three positions in all the rankings, similar to phos, with the
216
exception of geNorm results where it occupied the 5th place. When comparing the NormFinder top six
217
reference genes with the other methodologies, five of the six genes were the same in the Bestkeeper
218
results, whereas four of the six were the same in the geNorm results. Small discrepancies between the
219
three algorithms are common (30-33), nevertheless, results are usually consistent within the best
220
classified genes, as found in this study. Interestingly, several housekeeping genes such actin1, gapdh1
221
or tub are among the less stable reference genes tested. These genes have been widely used as internal
222
standards for normalization of RT-PCR and RT-qPCR data. During the last decade, many studies have
223
revealed the instability of housekeeping genes´ expression (31, 34, 35), although they can also be
224
acceptable internal standards under certain conditions. This fact is evidence that gene expression
225
stability varies with the gene, species, tissue and culture conditions used (36, 37). In summary, our
226
results reinforce the fact that reference genes should never be chosen on the unique basis of suitability
227
in other organisms. Therefore, if no data are available relative to a given species, experimental
228
validation should always be the first step in RT-qPCR expression studies. In this sense, geNorm,
229
NormFinder and Bestkeeper are the three methods commonly used for analyzing the stability of
230
reference gene panels. Other statistical methods have been developed with the same aim (20, 22, 38)
231
but are used less frequently, most likely due to their more complex application. Bestkeeper has a
232
limitation in the number of genes that can be analyzed. This is inconvenient when attempting to
233
analyze larger (and thus more robust) reference gene panels. The geNorm tool calculates the best pair
234
of genes based on the variation with the other genes tested. In our case, its results were the most
235
discordant. In general, the three methods are susceptible to producing the wrong results when
236
introducing co-regulated genes. The choice of the most appropriate tool is a difficult task, and due to
237
the lack of a gold standard at this issue, the use of several algorithms represents a reasonable way to 11
238
proceed. Normalization of RT-qPCR data based on multiple reference genes allows for a more precise
239
evaluation of gene expression than single-gene correction (16, 18, 21). In this sense, the balance
240
between costs and benefits plays an important role in the number of genes selected for normalization.
241
The analysis of the optimal number of reference genes to be used revealed that the difference in
242
Acc.SDN is not very large when comparing three versus one gene (0.24 vs. 0.29 cycles, Fig 2D).
243
Nevertheless, the difference between using one versus three genes can be decisive in case of pipetting
244
errors while setting up the reference gene reaction. Taking into account the results obtained using the
245
three algorithms, a reference index formed by sar1, pep and phos could be considered the ideal internal
246
standard for normalizing RT-qPCR data in P. ostreatus species. Interestingly, the best reference gene
247
detected in our study was sar1, an orthologue of the sar1 GTPase of Trichoderma reesei, a species in
248
which it was described as an ideal reference gene (29). According to the conserved domains of this
249
protein, sar1 participates in important biological processes such as GTP binding, intracellular transport,
250
protein secretion and signal transduction. Thus, it seems reasonable for the organism to maintain its
251
constitutive transcription. This gene has also been validated as stable in the ascomycete Aspergillus
252
niger (28). According to its exceptional intra-group stability (SDn < 0.1 in every medium, Table S1), its
253
use as a single reference gene could be justified (although not recommended) when comparing the
254
expression of time-course samples from the same culture medium. In order to extend our results to
255
other fungal species we performed an exploratory analysis using publicly available microarray data
256
(Supplemental methods, Fig S1). The level of expression of sar1, pep and phos compared to the mean
257
expression of all genes evidenced the apparent expression stability of our candidates (especially of
258
sar1) in the fungal models Phanerochaete chrysosporium, Coprinopsis cinerea and Postia placenta.
259
These results should be taken with caution; they are not a direct validation of their general suitability
260
for normalization of RT-qPCR data, as the exploratory approach used in these fungal models lacks of
261
sufficient statistical robustness (i.e. sufficient samples of different media or developmental stages).
262
Nevertheless, according to the results obtained we propose that genes pep, phos, and specially sar1 are 12
263
good candidates to be included in RT-qPCR reference gene panels to be further validated in these
264
species.
265 266
Impact of normalization strategy on gene expression quantification.
267
To uncover the impact of the selection of reference genes on gene expression calculations, we
268
analyzed the expression of lacc10 and mnp3 in all of the samples using five normalization strategies
269
(Table 5). We obtained time-course expression ratios (Formula 2) for use as reference samples in
270
mycelium cultured on SmF medium harvested at day 1. The results showed clear evidence that the
271
normalization strategy strongly influences expression results. In fact, in some cases, the expression
272
ratios showed totally opposite patterns when normalized with validated versus non-validated reference
273
genes. This was particularly evident in the expression of both genes in s-SSF and SSF media (Fig 4,
274
Fig 5). For example, at day 15 of s-SSF culture, lacc10 showed an overexpression of 3.1 log2-fold
275
when using alternative 2 (the optimal reference index); nevertheless, the use of the single-gene cyc
276
(alternative 4) yielded a repression of -1.25 log2-fold (Fig 4). This represents a very important
277
difference between both normalizations using the same input data, which can lead to entirely incorrect
278
conclusions. In general, normalization with unstable reference indexes (alternatives three to five) led to
279
an underestimation of the expression values. As expected, the results obtained with the first two
280
alternatives yielded strong correlations of 0.978 in mnp3 and 0.995 in lacc10, which substantially
281
decreased when using indexes of lower quality (Table S3, Table S4, Supplementary information). The
282
five gene expression sets (one for each normalization strategy) were further analyzed using hierarchical
283
clustering combined with a heatmap (Fig 6). The exploratory analysis separated two main clusters
284
corresponding to the lacc10 and mnp3 genes, regardless of the normalization strategy. Within each
285
group, alternatives I and II always clustered separate from alternatives IV and V. The Mann-Whitney
286
test confirmed that there were not significant differences (p