Accepted Manuscript Deep phylogenetic incongruence in the angiosperm clade Rosidae Miao Sun, Douglas E. Soltis, Pamela S. Soltis, Xinyu Zhu, J. Gordon Burleigh, Zhiduan Chen PII: DOI: Reference:
S1055-7903(14)00387-X http://dx.doi.org/10.1016/j.ympev.2014.11.003 YMPEV 5067
To appear in:
Molecular Phylogenetics and Evolution
Received Date: Revised Date: Accepted Date:
10 June 2014 1 November 2014 5 November 2014
Please cite this article as: Sun, M., Soltis, D.E., Soltis, P.S., Zhu, X., Gordon Burleigh, J., Chen, Z., Deep phylogenetic incongruence in the angiosperm clade Rosidae, Molecular Phylogenetics and Evolution (2014), doi: http://dx.doi.org/10.1016/j.ympev.2014.11.003
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1
Deep phylogenetic incongruence in the angiosperm clade Rosidae
2 3
Miao Suna, b, Douglas E. Soltis*, c, d, e, Pamela S. Soltisd, e, Xinyu Zhuf, J. Gordon
4
Burleighc, e, Zhiduan Chen*, a
5 6
a
7
the Chinese Academy of Sciences, Beijing 100093, China;
8
b
Graduate University of the Chinese Academy of Sciences, Beijing 100039, China;
9
c
Department of Biology, University of Florida, Gainesville, FL 32611, USA;
10
d
Florida Museum of Natural History, University of Florida, Gainesville, FL 32611,
11
USA;
12
e
University of Florida Genetics Institute
13
f
School of Life Science, Nantong University, Nantong 226007, China;
State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany,
14 15
*Corresponding authors:
16
Zhiduan Chen:
[email protected], 8610-62836090
17
Douglas E. Soltis:
[email protected], (1)-352-273-1963
18
Running title: Deep phylogenetic incongruence in Rosidae
19
Data Archival Location: Dryad (XXXX)
20
1 / 40
21
Abstract
22
Analysis of large data sets can help resolve difficult nodes in the tree of life and also
23
reveal complex evolutionary histories. The placement of the
24
Celastrales-Oxalidales-Malpighiales (COM) clade within Rosidae remains one of the
25
most confounding phylogenetic questions in angiosperms, with previous analyses
26
placing it with either Fabidae or Malvidae. To elucidate the position of COM, we
27
assembled multi-gene matrices of chloroplast, mitochondrial, and nuclear sequences,
28
as well as large single- and multi-copy nuclear gene data sets. Analyses of multi-gene
29
data sets demonstrate conflict between the chloroplast and both nuclear and
30
mitochondrial data sets, and the results are robust to various character-coding and
31
data-exclusion treatments. Analyses of single- and multi-copy nuclear loci indicate
32
that most loci support the placement of COM with Malvidae, fewer loci support COM
33
with Fabidae, and almost no loci support COM outside a clade of Fabidae and
34
Malvidae. Although incomplete lineage sorting and ancient introgressive
35
hybridization remain as plausible explanations for the conflict among loci, more
36
complete sampling is necessary to evaluate these hypotheses fully. Our results
37
emphasize the importance of genomic data sets for revealing deep incongruence and
38
complex patterns of evolution.
39
Keywords
40
Hybridization; introgression; incomplete lineage sorting; COM clade; incongruence;
41
phylogenomics
42
2 / 40
43
1. Introduction
44
Genome-scale data can provide the power to resolve some of the most perplexing
45
parts of the tree of life (e.g., Dunn et al., 2008; Lee et al., 2011; Simon et al., 2012;
46
Smith et al., 2011; Yoder et al., 2013). Furthermore, estimates from numerous
47
independent loci also can reveal phylogenetic incongruence caused by different
48
evolutionary processes, such as gene duplication and loss, recombination,
49
hybridization, lateral gene transfer, or incomplete lineage sorting (e.g., Cui et al., 2013;
50
Degnan and Rosenberg, 2009; Doyle, 1992; Goodman et al., 1979; Hudson, 1983;
51
Maddison, 1997; Oliver, 2013). Molecular phylogenetic analyses have resolved much
52
of the backbone angiosperm phylogeny (e.g., Ruhfel et al., 2014; Soltis et al., 2009,
53
2011) and clarified long-standing questions regarding relationships within major
54
clades such as monocots (Monocotyledoneae; Chase et al., 2000; Givnish et al., 2006,
55
2010; Graham et al., 2006; Jerrold et al., 2004; Saarela et al., 2008), asterids
56
(Asteridae; Albach et al., 2001; Bremer et al., 2001, 2004; Hilu et al., 2003; Moore et
57
al., 2011; Olmstead et al., 2000), and rosids (Rosidae; Hilu et al., 2003; Jansen et al.,
58
2007; Moore et al., 2010; Qiu et al., 2010; Soltis et al., 2007, 2011; Wang et al., 2009).
59
Yet much of this work is based either largely or exclusively on chloroplast sequence
60
data, which represent a single, linked, and usually maternally inherited locus. New
61
sequencing technologies make it feasible to obtain data sets of numerous independent
62
nuclear loci, which can be used to evaluate results from analyses of chloroplast gene
63
sequence data and reveal phylogenetic conflict among loci (e.g., Burleigh et al., 2011; 3 / 40
64 65
Duarte et al., 2010; Lee et al., 2011; Xi et al., 2014; Zeng et al., 2014). Introgressive hybridization has played an important role in plant evolution, and
66
incomplete lineage sorting also likely occurred during some rapid radiations.
67
Consequently, there are numerous examples of discordance between chloroplast and
68
nuclear gene trees in plants (e.g., Acosta and Premoli, 2010; Okuyama et al., 2005;
69
Rieseberg and Soltis, 1991; Rieseberg and Wendel, 1993; Rieseberg et al., 1995,
70
1996a; Soltis and Kuzoff, 1995; Soltis and Soltis, 2009; Tsitrone et al., 2003; Wendel
71
et al., 1995; Xi et al., 2014). Although phylogenetic analyses of angiosperm backbone
72
relationships based on nuclear, mitochondrial, and chloroplast loci have largely agreed,
73
one major point of conflict is the placement of COM
74
(Celastrales-Oxalidales-Malpighiales; Endress and Matthews, 2006; Zhu et al., 2007)
75
within the large Rosidae clade.
76
Rosidae comprise approximately one quarter of all angiosperm species, which
77
are morphologically diverse, exhibit extraordinary heterogeneity in habit, habitat, and
78
life form, and include most temperate and tropical forest trees (Wang et al., 2009).
79
Some members possess novel biochemical pathways (e.g., production of glucosinolate,
80
and cyanogenic glycosides for defense), and many are important crops (e.g., Fabaceae
81
and Rosaceae). Symbioses with nitrogen-fixing bacteria are largely confined to this
82
clade as well. Resolving relationships within Rosidae has been difficult (e.g., Hilu et
83
al., 2003; Jansen et al., 2007; Lee et al., 2011; Moore et al., 2010, 2011; Qiu et al.,
84
2010; Ruhfel et al., 2014; Soltis et al., 2005, 2007, 2011; Wang et al., 2009; Zhu et al.,
85
2007) due to a series of rapid radiations (Wang et al., 2009). However, multi-gene 4 / 40
86
studies have recovered two major, well-supported clades — the Fabidae (i.e., eurosids
87
I, fabids) and Malvidae (i.e., eurosids II, malvids) (Hilu et al., 2003; Judd and
88
Olmstead, 2004; Moore et al., 2010, 2011; Soltis et al., 1999, 2000, 2005, 2007, 2011;
89
Wang et al., 2009; Xi et al., 2014).
90
COM contains approximately one third of all Rosidae, 870 genera and ~19,000
91
species (APG III, 2009). Molecular analyses, largely dominated by chloroplast genes,
92
have usually placed COM with Fabidae (Table 1; e.g., Burleigh et al., 2009; Hilu et
93
al., 2003; Jansen et al., 2007; Moore et al., 2010, 2011; Soltis et al., 2005, 2007, 2011;
94
Wang et al., 2009). Analyses of the mitochondrial gene matR first suggested the
95
placement of COM with Malvidae (Zhu et al., 2007), and subsequent studies based on
96
nuclear or mitochondrial genes supported this placement, although typically with
97
limited taxon sampling (Table 1; Burleigh et al., 2011; Duarte et al., 2010; Finet et al.,
98
2010; Lee et al., 2011; Morton, 2011; Qiu et al., 2010; Shulaev et al., 2010; Xi et al.,
99
2014; Zhang et al., 2012). Several floral characters also appear to link COM with
100
Malvidae. For example, in COM and Malvidae species, the inner integument of the
101
ovule is thicker than the outer integument at the time of fertilization, a feature that is
102
extremely rare in Fabidae and other eudicots. Additionally, contorted petals and a
103
tendency towards polystemony and polycarpy also suggest a placement of COM
104
members with Malvidae rather than with Fabidae (Endress and Matthews, 2006;
105
Endress et al., 2013).
106 107
Although analyses of chloroplast gene sequence data generally appear to conflict with analyses of mitochondrial and nuclear gene sequence data, these studies often 5 / 40
108
differ greatly in taxon sampling and analytical methods (Table 1; but see Xi et al.,
109
2014). Thus, it is unclear whether the different placements of COM are due to errors
110
in the analyses or biological incongruence among loci. The level of incongruence
111
within the nuclear genome also is unknown. We use COM as an exemplar to
112
investigate phylogenetic incongruence at deep levels in angiosperm phylogeny.
113
Specifically, we first compare phylogenetic results from chloroplast, mitochondrial,
114
and nuclear data sets having similar taxon sampling and examine whether the results
115
are robust to various character-coding and data-exclusion protocols. We also survey
116
large-scale nuclear data sets of both single-copy and multi-copy genes to investigate
117
the patterns of phylogenetic discordance within the nuclear genome and then discuss
118
whether these patterns are consistent with incomplete lineage sorting (i.e., deep
119
coalescence) (Maddison, 1997; Maddison and Knowles, 2006; Page and Charleston,
120
1998) or ancient hybridization and introgression (Chang et al., 2011; Cui et al., 2013;
121
Linder and Rieseberg, 2004; Tsitrone et al., 2003; Zhang et al., 2014).
122
2. Materials and methods
123
Throughout this paper, to facilitate discussion, we treat COM, Fabidae, and Malvidae
124
as three separate groups, despite current classifications that consider COM to be part
125
of Fabidae (APG III, 2009; Cantino et al., 2007).
126
2.1 Phylogenetic analyses of chloroplast, mitochondrial, and nuclear data
127
To compare the placement of COM in analyses of chloroplast, mitochondrial, and
128
nuclear gene data sets, we assembled published matrices with similar taxon sampling. 6 / 40
129
For the chloroplast gene sequence data, we pruned 82 seed plant taxa from the
130
78-gene chloroplast data set of Ruhfel et al. (2014). We also used the 92-taxon, 5-gene
131
nuclear data set of Zhang et al. (2012), and the 79-taxon, 4-gene mitochondrial matrix
132
of Qiu et al. (2010). The taxon sampling in all of these studies was designed to
133
reconstruct relationships across angiosperms using representative sampling of major
134
clades, including COM. We used the nuclear gene sequence data set of Zhang et al.
135
(2012) to guide our assembly of chloroplast and mitochondrial gene data sets,
136
attempting to ensure as much as possible that the taxa employed from these data sets
137
are from the same species or genus.
138
We performed a series of Maximum Likelihood (ML) phylogenetic analyses on
139
each of the three data sets using RAxML v.7.2.8 (Stamatakis, 2008). For all analyses,
140
we estimated the optimal ML tree and performed 100 nonparametric bootstrap (BS)
141
replicates. First, we analyzed the full nucleotide alignments using an unpartitioned
142
GTRCAT model. We also examined the variation of COM placement in single-gene
143
topologies inferred from these three data sets using RAxML with the GTRCAT model.
144
For the three multi-gene data sets, we also analyzed the amino acid (AA) alignment
145
using the PROTCATJTT model (Jones et al., 1992). RY coding, which recodes the
146
nucleotides as binary characters, either purines (A or G = R) or pyrimidines (C or T =
147
Y), has been used to ameliorate biases caused by saturation, rate heterogeneity, and
148
base composition (Delsuc et al., 2005; Gibson et al., 2005; Harrison et al., 2004;
149
Phillips et al., 2004; Phillips and Penny, 2003). Thus, we also transformed the three
150
full nucleotide matrices to RY coding and ran a ML analysis using the GTRCAT 7 / 40
151 152
model. The elimination of potentially misleading sites from an alignment is a common
153
practice in phylogenetic analysis (e.g., Delsuc et al., 2005; Philippe et al., 2005; Rajan,
154
2013; Regier and Zwick, 2011). We used two methods to remove highly variable sites
155
as a further means of exploring the data that may contribute to the discordant
156
placements of COM. First, following Goremykin et al. (2010), we organized the
157
nucleotide sites in each alignment in order of rate based on the observed variability
158
(OV) criterion. For each sorted alignment, we then removed the most variable 5%,
159
10%, 20%, 30%, 40%, and 50% of the sites. After each removal, we performed a ML
160
analysis on the remaining sites in each alignment using the GTRCAT model. Second,
161
we excluded the third codon positions and analyzed the alignments of the first and
162
second codon positions only using RAxML with the GTRCAT model.
163
2.2 Single-copy nuclear gene analysis
164
The largest nuclear gene data set used to resolve the backbone of angiosperm
165
relationships comprises 22,833 groups of orthologs (Lee et al., 2011). Although this
166
data set includes only seven species of Malpighiales representing COM (no
167
Celastrales or Oxalidales species were included), it provides estimates from by far the
168
greatest number of presumably independent nuclear loci for the placement of COM.
169
We examined the individual gene trees from this data set to look for variation in the
170
placement of COM. First, we divided the full, concatenated nucleotide alignment
171
from Lee et al. (2011; available on the BIGPLANT website: http://nybg.bio.nyu.edu/)
172
into separate alignments, each representing a set of putative orthologs. Next we 8 / 40
173
identified the ortholog sets that were potentially informative regarding the placement
174
of COM; these alignments contained at least one COM species, one Malvidae, one
175
Fabidae, and one other species not in any of these groups. In all, 8,445 of the ortholog
176
sets were potentially informative regarding the placement of COM. For each
177
potentially informative ortholog set alignment, we performed 100 ML bootstrap
178
replicates using RAxML v.7.2.8 with the GTRCAT model (Stamatakis, 2008), and we
179
counted how many bootstrap replicates support a clade of COM and Fabidae species,
180
how many support a clade of COM and Malvidae species, and how many support
181
COM outside a clade of Fabidae and Malvidae. The analysis of the support for the
182
COM placement was automated using Perl scripts and Newick utilities (Junier and
183
Zdobnov, 2010).
184
2.3 Multi-copy nuclear gene analysis
185
We also examined support for the placement of COM using multi-copy nuclear gene
186
families, i.e., gene families that may have multiple sequences from one or more taxa.
187
Unlike single-copy sets of orthologs, interpreting a phylogenetic tree supported by
188
multi-copy gene families is not always straightforward. For example, in a multi-copy
189
gene tree, a species from COM could have one sequence that groups with Fabidae
190
and one sequence that groups with Malvidae. To solve this problem, we estimated the
191
reconciliation cost of each gene family tree given a topology with COM sister to
192
Fabidae, COM sister to Malvidae, and COM sister to a Fabidae + Malvidae clade.
193
We used three different reconciliation costs, each implying a different evolutionary
194
scenario: 1) the minimum number of implied gene duplications; 2) the minimum 9 / 40
195
number of implied duplications and losses; and 3) the minimum number of implied
196
deep coalescence events (e.g., Maddison, 1997). We used a parsimony criterion to
197
distinguish among the three species tree topologies; the topology with the lowest
198
reconciliation cost, i.e., the topology that implies the fewest evolutionary events, is
199
the topology that is supported by the gene family. If two or three of the topologies
200
have equal reconciliation costs, the gene family is considered uninformative regarding
201
the placement of COM.
202
We assembled a collection of 3,748 gene family alignments obtained from the
203
genome sequences of 22 plant taxa with OrthoMCL (Chen et al., 2006) and aligned
204
with MAFFT (Katoh et al., 2005). Included are Selaginella moellendorffii,
205
Physcomitrella patens, and 20 angiosperm species, including one species representing
206
COM, Populus trichocarpa. Although the taxon sampling is sparse, using only
207
sequences from completely sequenced genomes may enable more accurate estimates
208
of processes such as gene loss than incomplete transcriptome data sets.
209
For each multi-copy gene alignment, we performed 100 bootstrap replicates
210
using RAxML v.7.2.8 with the GTRCAT model (Stamatakis, 2008). For each of the
211
resulting bootstrap trees, we calculated the reconciliation cost under the three different
212
cost models (duplications, duplications and losses, and deep coalescence) using a
213
species tree in which Populus (COM clade) was sister to Fabidae (Fragaria vesca,
214
Medicago trunculata, and Glycine max), one in which Populus was sister to Malvidae
215
(Arabidopsis thaliana, Thellungiella parvula, Carica papaya, and Theobroma cacao),
216
and one in which Populus was sister to Fabidae + Malvidae. The rooting of gene trees 10 / 40
217
can greatly affect the estimates of the reconciliation cost, and it is often difficult to
218
infer the root of a multi-copy gene tree. Thus, for each gene tree, we used a rooting
219
that minimized the reconciliation cost. We calculated the reconciliation costs for each
220
gene tree bootstrap replicate under three possible species trees using the program
221
OptRoot, written by Andre Wehe and available at http://www.wehe.us/optroot.html.
222
All data sets and results are available on Dryad (XXX; www.datadryad.org).
223
3. Results
224
3.1 Chloroplast, mitochondrial, and nuclear data sets
225
ML analyses of the chloroplast, mitochondrial, and nuclear multi-gene alignments
226
with similar taxon sampling recover different placements of COM (Figures 1–3). We
227
focus on the relationships among members of Rosidae, but all the trees generated in
228
our analyses in the present study are available as supplemental data and on Dryad
229
(XXXX).
230
The phylogeny based on the 82-taxon, 78-gene chloroplast data set largely agrees
231
with conclusions from previous chloroplast-dominated studies (APG III, 2009; Moore
232
et al., 2010; Ruhfel et al., 2014; Soltis et al., 2011; Wang et al., 2009), supporting the
233
placement of COM with Fabidae (Figure 1). COM received 100% BS support, as did
234
a clade of all COM and Fabidae species, but the precise placement of COM was
235
uncertain. There was 52% BS support for a sister relationship of COM and all
236
Fabidae except Bulnesia (Zygophyllales; Figure 1), which was sister to COM and
237
other Fabidae species. Although most chloroplast genes support a placement of COM 11 / 40
238
with Fabidae, albeit generally with low BS support, no chloroplast genes provide
239
even 50% BS support for COM with Malvidae (Table S1). The analysis of the full
240
chloroplast alignment with RY coding indicates 100% BS support for a clade of COM
241
and Fabidae species (Figure S1). AA coding indicates 47% BS support for a clade of
242
the COM species and all Fabidae except Bulnesia, which is placed in Malvidae,
243
although with low support (Figure S2). Removing the highly variable nucleotide sites
244
from the chloroplast alignment quickly erodes support for COM with Fabidae.
245
Bootstrap support for COM with Fabidae drops from 98% to 48% to 1% with the
246
removal of the 5%, 10%, and 20% most variable sites, respectively, with no support
247
after removing more sites. However, none of the site removal analyses indicates any
248
support for a clade of COM and Malvidae or COM outside of Fabidae + Malvidae.
249
Removing the third codon positions from the 78-gene chloroplast data set reduced the
250
BS support for a clade of COM and Fabidae species to 90%, with Bulnesia initially
251
sister to COM (Figure S3).
252
Trees from analyses of the 79-taxon, 4-gene mitochondrial data set generally
253
indicate a close relationship of COM with species from Malvidae (Figure 2). In the
254
ML analysis of the full nucleotide data set, there is 94% BS support for a clade of
255
COM species and all Malvidae except Stachyurus and Oenothera (Figure 2).
256
Additionally, Guaiacum (Zygophyllales, Fabidae) is sister to Stachyurus
257
(Crossosomatales, Malvidae) in agreement with Qiu et al. (2010). However, the
258
placement of Guaiacum differs from those obtained in studies based largely on
259
chloroplast genes (see APG III, 2009; Soltis et al., 2011). Analyses of the four 12 / 40
260
individual mitochondrial genes either show weak (< 60%) BS support linking COM
261
with Malvidae or yield trees that are unresolved, with little, if any, support for the
262
monophyly of either clade or Rosidae (Table S1). RY coding greatly reduces support
263
for relationships within Rosidae, with no support even for the monophyly of COM
264
(not shown). AA coding indicates 74% BS support for a clade of COM species and all
265
Malvidae except Stachyurus and Oenothera (Figure S4). Removing the most variable
266
5% of sites yields 100% BS support for a clade of COM species and all Malvidae
267
except Stachyurus and Oenothera. However, removing more variable sites greatly
268
reduces support for relationships throughout the tree; after removing the 10% most
269
variable sites, BS support for COM drops to 23%. After removing the third codon
270
position from the 4-gene mitochondrial data set, there is 31% BS support for a clade
271
of COM species and all Malvidae except Stachyurus and Oenothera (Figure S5).
272
The results from analyses of the 92-taxon, 5-gene nuclear data set provide 100%
273
BS support for a clade that includes COM species and all species of Malvidae except
274
Pelargonium, Oenothera, and Lagerstroemia (Figure 3), as do the results from ML
275
analyses of the RY and AA matrices (Figures S6, S7). This placement of COM with
276
Malvidae is also in agreement with the ML analyses of the five individual nuclear
277
genes, although with different levels of support (Table S1). Likewise, the ML analyses
278
of nucleotides after removing 5% and 10% of the most variable sites yield 100% BS
279
support for a clade that includes all of the COM species and Malvidae species, except
280
Pelargonium, Oenothera, and Lagerstroemia. Removing 20% or 30% of the most
281
variable sites reduces the support for this clade to 96% and 94%, respectively, but 13 / 40
282
removing more sites greatly reduces support for relationships within Rosidae in
283
general, including support for the monophyly of COM. Removing the third codon
284
position from the 5-gene nuclear data set still resulted in 100% BS support for a clade
285
of COM and all Malvidae species except Pelargonium, Oenothera, and Lagerstroemia
286
(Figure S8).
287
3.2 Single-copy nuclear gene analysis
288
Although most of the orthologous gene sets from the Lee et al. (2011) analysis were
289
not informative regarding the placement of COM, among those genes that do support
290
one of the three possible placements (COM with Fabidae, COM with Malvidae, or
291
COM outside of Fabidae + Malvidae), 62–75% support a clade of COM with
292
Malvidae, 25-38% support a clade of COM with Fabidae, and none of the
293
orthologous gene sets support COM outside of Fabidae + Malvidae (Table 2). While
294
increasing the minimum bootstrap support cutoff reduces the number of orthologous
295
gene sets supporting each hypothesis, it has relatively little effect on the percentage of
296
informative genes supporting COM with Malvidae versus COM with Fabidae (Table
297
2).
298
3.3 Multi-copy nuclear gene analysis
299
Similar to the single-copy nuclear gene analysis, most of the multi-copy gene trees
300
were not informative regarding the placement of COM, but the majority of
301
informative genes support the placement of COM with Malvidae. Between 71–98% of
302
the informative genes support a clade of COM and Malvidae species, depending on
303
the model of reconciliation and the minimum bootstrap support level (Table 3). The 14 / 40
304
duplication-only model provides the strongest support for a clade of COM with
305
Malvidae (≥ 91%; Table 3). The maximum percentage of informative genes
306
supporting COM with Fabidae is 27%, which is based on the deep coalescence
307
reconciliation model (Table 3). A clade of COM outside of Fabidae + Malvidae is
308
recovered by 0–6% of the genes in these analyses (Table 3).
309
4. Discussion
310
4.1 Conflict among multi-locus phylogenetic analyses
311
In spite of much recent progress resolving the angiosperm tree of life, the
312
phylogenetic placement of COM remains uncertain. Most previous efforts to place
313
COM have used a variety of data sources, taxon sampling strategies, and phylogenetic
314
methods (but see Xi et al., 2014). Therefore, it is difficult to determine if the
315
conflicting placements of COM are due to errors or actual biological conflict among
316
loci (Table 1). Our ML analyses of multi-gene chloroplast, mitochondrial, and nuclear
317
data sets with similar taxon sampling reinforce the observation that analyses of
318
chloroplast loci yield a topology that differs from analyses of mitochondrial and most
319
nuclear loci (Table 1; Figures 1–3). The isolated placements of Lagerstroemia,
320
Oenothera, Pelargonium, and Stachyurus make Malvidae non-monophyletic in some
321
of our analyses of mitochondrial and nuclear data sets; however, their positions in our
322
trees are consistent with those of Qiu et al. (2010) and Zhang et al. (2012),
323
respectively. Furthermore, these four genera are respectively from Myrtales
324
(Lagerstroemia, Lythraceae; Oenothera, Onagraceae), Geraniales (Pelargonium, 15 / 40
325
Geraniaceae), and Crossosomatales (Stachyurus, Stachyuraceae), the exact placement
326
of which within Rosidae varies among chloroplast, mitochondrial, and nuclear trees
327
(e.g., Morton, 2011; Qiu et al., 2010; Soltis et al., 2011; Xi et al. 2014; Zhang et al.,
328
2012; Zhu et al., 2007).
329
Our analyses of the chloroplast, mitochondrial, and nuclear data sets are robust to
330
different character-coding strategies, which are often used to detect heterogeneous
331
phylogenetic signals or error. AA matrices and RY coding are used to ameliorate
332
nucleotide saturation and composition biases (Delsuc et al., 2005; Gibson et al., 2005;
333
Harrison et al., 2004; Hashimoto et al., 1995; Phillips and Penny, 2003), and removal
334
of highly variable sites has been proposed to reduce long-branch attraction or
335
model-fitting error (see Philippe et al., 2005). Some of these experiments erode
336
phylogenetic signal for the placement of COM, but none support an alternative
337
placement of COM. Although we failed to find obvious signs of major systematic or
338
sampling biases, it is difficult to demonstrate the absence of error. In fact, the (weakly
339
supported) variation in single-gene topologies of linked chloroplast genes suggests
340
that some level of error may be present in chloroplast gene sequence analyses (Table
341
S1). Nonetheless, the consistency of the incongruence suggests that there may be an
342
underlying biological basis to the conflict among chloroplast, nuclear, and
343
mitochondrial loci.
344
4.2 Evolutionary patterns suggested by conflict among nuclear loci
345
If the conflict among chloroplast, nuclear, and mitochondrial gene sequence data is
346
due to evolutionary events such as ancient hybridization or incomplete lineage sorting, 16 / 40
347
we would also expect to see conflict among independent nuclear loci. Indeed, within
348
the single-copy nuclear gene data set from Lee et al. (2011), on average 66% of the
349
informative genes support a placement of COM with Malvidae, while on average 34%
350
weakly support the placement of COM with Fabidae (Table 2). The multi-copy genes
351
reveal similar levels of incongruence, with at least 71% of the informative genes
352
supporting COM with Malvidae, with far less support for COM with Fabidae and
353
very little support for COM outside a clade of Malvidae + Fabidae (Table 3). This
354
predominant placement of COM with Malvidae within multi-copy gene trees is
355
consistent with previous gene tree parsimony analyses (Burleigh et al., 2011; Górecki
356
et al., 2012). The placement of COM from multi-copy genes is robust to the model of
357
gene reconciliation (Table 3). Furthermore, in both the single- and multi-copy gene
358
results, the overall percentage of informative genes supporting each of the three
359
hypotheses is relatively stable no matter the bootstrap cutoff we use (Tables 2, 3).
360
If the differences in the position of COM among nuclear loci are not due to
361
errors, they may reflect biological processes such as ancient hybridization and/or
362
incomplete lineage sorting. Distinguishing between incomplete lineage sorting and
363
hybridization can be challenging (e.g., Buckley et al., 2006; Joly et al., 2009; Holder
364
et al., 2001; Holland et al., 2008; Sang and Zhong, 2000), and the sparse and
365
incomplete taxon sampling within our nuclear gene data sets, as well as the ancient
366
divergence time of the major rosid lineages (Bell et al., 2010; Wang et al., 2009),
367
make it especially difficult to differentiate between the two. Although the effects of
368
incomplete lineage sorting typically are studied on recent radiations, they can also 17 / 40
369
obscure the resolution of ancient radiations (Oliver, 2013; Whitfield and Lockhart,
370
2007), such as the deep relationships among mammals (McCormack et al., 2012;
371
Song et al., 2012). If we consider the placement of COM as a rooted 3-taxon (COM,
372
Fabidae, and Malvidae) phylogenetic problem, a process of incomplete lineage
373
sorting should yield approximately equal numbers of nuclear genes supporting the
374
two possible non-species tree topologies (Huson et al., 2005). Instead, we see that the
375
majority of genes supports COM with Malvidae, with far less support for COM with
376
Fabidae, and almost no support for COM outside of Fabidae + Malvidae (Tables 2, 3).
377
The differences in levels of support from nuclear loci suggest that incomplete lineage
378
sorting does not explain the phylogenetic discordance among genes. However, since
379
both Malvidae and Fabidae are not clades in some analyses (Figures 1–3), it is
380
possible that this 3-taxon case does not apply, and the expected patterns of support
381
from lineage sorting with more than three species are more complex (e.g., Rosenberg
382
and Nordberg, 2002; Degnan and Rosenberg, 2006).
383
Many plant lineages have experienced hybridization and introgression
384
throughout their evolutionary histories (e.g., Okuyama et al., 2005), and there are
385
more than a hundred records of interspecific hybridization among rosid taxa alone
386
(Rieseberg and Soltis, 1991; Rieseberg et al., 1996a). An ancient introgressive
387
hybridization event would likely produce conflict among independent loci (Wendel
388
and Doyle, 1998). The different placement of COM in trees constructed from
389
mitochondrial and chloroplast gene sequence data suggests that the evolutionary
390
histories of these two subcellular compartments are unlinked, with the chloroplast 18 / 40
391
genome derived from the Fabidae lineage and the mitochondrial genome from the
392
Malvidae lineage. This result is unexpected given that the chloroplast and
393
mitochondrial genomes typically are both maternally inherited in angiosperms (Birky,
394
1995, 2001; Corriveau and Coleman, 1988; Mogensen, 1996). However, there are
395
documented cases of biparental inheritance of organellar genomes (e.g., Fauré et al.,
396
1994; Havey et al., 1998; Testolin and Cipriani, 1997; Yang et al., 2000), and paternal
397
inheritance of chloroplast genomes has been documented in COM species (Turnera
398
ulmifolia; Shore et al., 1994; Shore and Triassi, 1998) and Fabidae (Medicago sativa;
399
Masoud et al., 1990; Schumann and Hancock, 1989; Larrea; Yang et al., 2000).
400
Additionally, empirical studies suggest that progeny from a hybridization event may
401
exhibit strong paternal chloroplast inheritance, while mitochondrial inheritance
402
remained exclusively maternal (Schumann and Hancock, 1989; Masoud et al., 1990;
403
Shore et al., 1994; Xu, 2005). Thus, it is conceivable that an ancient hybridization
404
event resulted in different evolutionary histories for the chloroplast and mitochondrial
405
genomes.
406
In this putative ancient hybridization scenario, an early member of Fabidae or its
407
immediate ancestor acted as the paternal parent and crossed with the maternal lineage
408
of a member of Malvidae, with accompanying paternal transmission of the chloroplast
409
to the ancestor of COM (F1). This event could have created conflicting histories in the
410
chloroplast and mitochondrial genomes and conflict among nuclear loci with half of
411
the alleles in the F1 contributed by each parent (Figures 1, 2, 4). Repeated selfing or
412
crossing of the hybrid derivatives would not explain the high percentage of nuclear 19 / 40
413
loci supporting the relationship of COM with Malvidae (Tables 2, 3), suggesting the
414
possibility of subsequent backcrosses of the early hybrids to the maternal Malvidae,
415
and reducing the number of nuclear loci supporting the placement of COM with
416
Fabidae (Figure 4). Considering the difficulty of producing fertile hybrids from
417
crosses of distantly related lineages, this proposed ancient hybridization event should
418
be viewed with caution.
419
5. Concluding remarks
420
Numerous plant systematics studies have demonstrated the promise of genomic
421
data to resolve angiosperm relationships that were not evident in analyses with a few
422
genes (Burleigh et al., 2011; Finet et al., 2010; Lee et al., 2011; Moore et al., 2010,
423
2011; Zeng et al., 2014). We demonstrate here that analyses of data sets with many
424
unlinked loci can highlight the ambiguity and discordance in phylogenetic
425
relationships and potentially reveal the complexity of angiosperm evolution. Most, but
426
not all, single- and multi-copy nuclear loci, as well as mitochondrial genes, support
427
the placement of COM with Malvidae. This placement is also consistent with patterns
428
of morphological evolution (Endress and Matthews, 2006), but it contradicts the
429
strongly supported analyses of chloroplast sequence data sets (Figures 1–4; Jansen et
430
al., 2007; Moore et al., 2010, 2011; Ruhfel et al., 2014). While analyses involving a
431
single data source, such as the chloroplast genome, seek a single phylogeny, it may be
432
more informative to appreciate the potentially chimeric origins of COM rather than to
433
force its placement in a binary species tree. Although with current sampling we cannot 20 / 40
434
conclusively infer the processes that caused the conflicting placement of COM, our
435
analyses emphasize the importance of phylogenomic data for highlighting
436
phylogenetic incongruence and directing future studies.
437
Acknowledgements
438
We thank Yin-Long Qiu, who contributed to the early design of this project, and Ning
439
Zhang, who graciously provided us with the 92-taxon, 5-gene nuDNA alignment used
440
in this study. This work was supported by the National Natural Science Foundation of
441
China (NNSF 31270268), National Basic Research Program of China (No.
442
2014CB954101), Chinese Academy of Sciences Visiting Professorship for Senior
443
International Scientists (grant number 2011T1S24), State Key Laboratory of
444
Systematic and Evolutionary Botany (grant number LSEB2011-10), and the US
445
National Science Foundation (DEB-1301828).
446
The authors declare no conflict of interest.
447
21 / 40
448
References
449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488
Acosta, M.C., Premoli, A.C., 2010. Evidence of chloroplast capture in South American Nothofagus (subgenus Nothofagus, Nothofagaceae). Mol. Phylogenet. Evol. 54, 235-242. Albach, D.C., Soltis, D.E., Soltis, P.S., Olmstead, R.G., 2001. Phylogenetic analysis of asterids based on sequences of four genes. Ann. Mo. Bot. Gard. 88, 163-212. APG III, 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot. J. Linn. Soc. 161, 105-121. Bell, C.D., Soltis, D.E., Soltis, P.S., 2010. The age and diversification of the angiosperms re-revisited. Am. J. Bot. 97, 1296-1303. Birky, C.W., 1995. Uniparental inheritance of mitochondrial and chloroplast genes: mechanisms and Evolution. Proc. Natl. Acad. Sci. USA. 92, 11331-11338. Birky, C.W., 2001. The inheritance of genes in mitochondria and chloroplasts: laws, mechanisms, models. Annu. Rev. Genet. 35, 125-148. Bremer, K., Backlund, A., Sennblad, B., Swenson, U., Andreasen, K., Hjertson, M., Lundberg, J., Backlund, M., Bremer, B., 2001. A phylogenetic analysis of 100+ genera and 50+ families of euasterids based on morphological and molecular data with notes on possible higher level morphological synapomorphies. Pl. Syst. Evol. 229, 137-169. Bremer, K., Friis, E., Bremer, B., 2004. Molecular phylogenetic dating of asterid flowering plants shows early Cretaceous diversification. Syst. Biol. 53, 496-505. Buckley, T.R., Cordeiro, M., Marshall, D.C., Simon, C., 2006. Differentiating between hypotheses of lineage sorting and introgression in New Zealand alpine cicadas (Maoricicada Dugdale). Syst. Biol. 55, 411-425. Burleigh, J.G., Hilu, K.W., Soltis, D.E., 2009. Inferring phylogenies with incomplete data sets: a 5-gene, 567-taxon analysis of angiosperms. BMC Evol. Biol. 17, 61. Burleigh, J.G., Bansal, M.S., Eulenstein, O., Hartmann, S., Wehe, A., Vision, T.J., 2011. Genome-Scale Phylogenetics: Inferring the Plant Tree of Life from 18,896 Gene Trees. Syst. Biol. 60, 117-25. Cantino, P.D., Doyle, J.A., Graham, S.W., Judd, W.S., Olmstead, R.G., Soltis, D.E., Soltis, P.S., Donoghue, M.J., 2007. Towards a phylogenetic nomenclature of Tracheophyta. Taxon 56, 822-846. Chang, S.W., Oshida, T., Endo, H., Nguyen, S.T., Dang, C.N., Nguyen, D.X., Jiang, X., Li, Z.J., Lin, L.K., 2011. Ancient hybridization and underestimated species diversity in Asian striped squirrels (genus Tamiops): inference from paternal, maternal and biparental markers. J. Zool. 285, 128-138. Chase, M.W., Soltis, D.E., Olmstead, R.G., Morgan, D., Les, D.H., Mishler, B.D., Duvall, M.R., Price, R.A., Hills, H.G., Qiu, Y.L., Kron, K.A., Rettig, J.H., 22 / 40
489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532
Conti, E., Palmer, J.D., Manhart, J.R., Sytsma, K.J,. Michaels, H.J., Kress, W.J., Karol, K.G., Clark, W.D., Hedren, M., Gaut, B.S., Jansen, R.K., Kim, K.J., Wimpee, C.F., Smith, J.F., Furnier, G.R., Strauss, S.H., Xiang, Q.Y., Plunkett, G.M., Soltis, P.S., Swensen, S.M., Williams, S.E., Gadek, P.A., Quinn, C.J., Eguiarte, L.E., Golenberg, E.Jr., Learn, G.H., Graham, S.W., Barrett, S.C.H., Dayanandan, S., Albert, V.A., 1993. Phylogenetics of seed plants: an analysis of nucleotide-sequences from the plastid gene rbcL. Ann. Mo. Bot. Gard. 80, 528-80. Chase, M.W., Soltis, D.E., Soltis, P.S., Rudall, P.J., Fay, M.F., Hahn, W.J., Sullivan, S., Joseph, J., Molvray, M., Kores, P.J., Givnish, T.J., Sytsma, K.J., Pires, J.C., 2000. Higher-level systematics of the monocotyledons: An assessment of current knowledge and a new classification, in: Wilson, K.L. Morrison, D.A. (Eds.), Monocots: Systematics and Evolution. CSIRO Publishing, Collingwood, pp. 3-16. Chen, F., Mackey, A.J., Stoeckert, C.J.Jr., Roos, D.S., 2006. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 34, D363-368. Comes, H.P., Abbott, R.J., 2001. Molecular phylogeography, reticulation, and lineage sorting in Mediterranean Senecio sect. Senecio (Asteraceae). Evolution 55, 1943-1962. Corriveau, J.L., Coleman., A.W., 1988. Rapid screening method to detect potential biparental inheritance of plastid DNA and results over 200 angiosperm species. Am. J. Bot. 75, 1443-1458. Cui, R., Schumer, M., Kruesi, K., Walter, R., Andolfatto, P., Rosenthal, G.G., 2013. Phylogenomics reveals extensive reticulate evolution in Xiphophorus fishes. Evolution 67, 2166-2179. Degnan, J.H., Rosenberg, N.A., 2006. Discordance of species trees with their most likely gene trees. PLoS Genet. 2, e68. Degnan, J.H., Rosenberg, N.A., 2009. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24, 332-340. Delsuc, F., Brinkmann, F.H., Philippe, H., 2005. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 6, 361-375. Doyle, J.J., 1992. Gene trees and species trees - molecular systematics as one-character taxonomy. Syst. Bot. 17, 144-163. Drummond, A.J., Rambaut, A., 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214. Duarte, J.M., Wall, P.K., Edger, P.P., Landherr, L.L., Ma, H., Pires, J.C., Leebens-Mack, J., dePamphilis, C.W., 2010. Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels. BMC Evol. Biol. 10, 61. Dunn, C.W., Hejnol, A., Matus, D.Q., Pang, K., Browne, W.E., Smith, S.A., Seaver, E., Rouse, G.W., Obst, M., Edgecombe, G.D., et al., 2008. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745-749. 23 / 40
533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576
Durand, E.Y., Patterson, N., Reich, D., Slatkin, M., 2011. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239-2252. Endress, P.K., Matthews, M.L., 2006. Floral structure and systematics in four orders of rosids, including a broad survey of floral mucilage cells. Plant Syst. Evol. 260, 223-251. Endress, P.K., Davis, C.C., Matthews, M.L., 2013. Advances in the floral structural characterization of the major subclades of Malpighiales, one of the largest orders of flowering plants. Ann. Bot. 111, 969-985. Fauré, S., Noyer, J.L., Carreel, F., Horry, J.P., Bakry, F., Lanaud, C., 1994. Maternal inheritance of chloroplast genome and paternal inheritance of mitochondrial genome in bananas (Musa acuminate). Curr. Genet. 25, 265-269. Finet, C., Timme, R.E., Delwiche, C.F., Marlétaz, F., 2010. Multigene phylogeny of the green lineage reveals the origin and diversification of land plants. Curr. Biol. 21, 2217-2222. Gibson, A., Gowri-Shankar, V., Higgs, P.G., Rattray, M., 2005. A comprehensive analysis of mammalian mitochondrial genome base composition and improved phylogenetic methods. Mol. Biol. Evol. 22, 251-264. Givnish, T.J., Pires, J.C., Graham, S.W., McPherson, M.A., Prince, L.M., Patterson, T.B., Rai, H.S., Roalson, E.R., Evans, T.M., Hahn, W.J., Millam, K.C., Meerow, A.W., Molvray, M., Kores, P., O‘Brien, H.E., Kress, W.J., Hall, J., Sytsma, K.J., 2006. Phylogeny of the monocotyledons based on the highly informative plastid gene ndhF: Evidence for widespread concerted convergence, in Columbus, J.T., Friar, E.A., Porter, J.M., Prince, L.M., Simpson, M.G. (Eds.), Monocots: Comparative Biology and Evolution Excluding Poales. Rancho Santa Ana Botanic Garden, California, pp. 28-51. Givnish, T.J., Ames, M., McNeal, J.R., dePamphilis, C.W., Graham, S.W., Pires, J.C., Stevenson, D.W., Zomlefer, W.B., Briggs, B.G., Duvall, M.R., Moore, M.J., Heaney, J.M., Soltis, D.E., Soltis, P.S., Thiele, K., Leebens-Mack, J.H., 2010. Assembling the tree of the monocotyledons: Plastome sequence phylogeny and evolution of Poales. Ann. Mo. Bot. Gard. 97, 584-616. Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G., 1979. Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed by globin sequences. Syst. Zool. 28, 132-163. Górecki, P., Burleigh, J.G., Eulenstein, O., 2012. GTP supertrees from unrooted gene trees: linear time algorithms for NNI based local searches, in: Bioinformatics Research and Applications, Springer, Berlin, pp. 102-114. Goremykin, V.V., Nikiforova, S.V., Bininda-Emonds, O.R.P., 2010. Automated removal of noisy data in phylogenomic analyses. J. Mol. Evol. 71, 319-331. Graham, S.W., Zgurski, J.M., McPherson, M.A., Cherniawsky, D.M., Saarela, J.M., Horne, E.S.C., Smith, S.Y., Wong, W.A., O‘Brien, H.E., Pires, J.C., Olmstead, R.G., Chase, M.W., Rai, H.S., 2006. Robust inference of monocot deep phylogeny using an expanded multigene plastid data set. Aliso. 22, 3-20. 24 / 40
577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620
Harrison, G.A., McLenachan, P.A., Phillips, M.J., Slack, K.E., Cooper, A., Penny, D., 2004. Four new avian mitochondrial genomes help get to basic evolutionary questions in the late Cretaceous. Mol. Biol. Evol. 21, 974-983. Hashimoto, T., Nakamura, Y., Kamaishi, T., Nakamura, F., Adachi, J., Okamoto, K., Hasegawa, M., 1995. Phylogenetic place of mitochondrion-lacking protozoan, Giardia lamblia, inferred from amino acid sequences of elongation factor 2. Mol. Biol. Evol. 12, 782-793. Havey, M.J., McCreight, J.D., Rhodes, B., Taurick, G., 1998. Differential transmission of the Cucumis organellar genomes. Theor. Appl. Genet. 97, 122-128. Hilu, K.W., Borsch, T., Müller, K., Soltis, D.E., Soltis, P.S., Savolainen, V., Chase, M.W., Powell, M.P., Alice, L.A., Evans, R., Sauquet, H., Neinhuis, C., Slotta, T.A.B., Rohwer, J.G., Campbell, C.S., Chatrou, L.W., 2003. Angiosperm Phylogeny Based on matK Sequence Information. Am. J. Bot. 90, 1758-1776. Holder, M.T., Anderson, J.A., Holloway, A.K., 2001. Difficulties in detecting hybridization. Syst. Biol. 50, 978-982. Holland, B.R., Benthin, S., Lockhart, P.J., Moulton, V., Huber, K.T., 2008. Using supernetworks to distinguish hybridization from incomplete lineage sorting. BMC Evol. Biol. 8, 202. Hudson, R.R., 1983. Testing the constant-rate neutral allele model with protein sequence data. Evolution 37, 203-217. Huson, D.H, Klöpper, T., Lockhart, P.J., Steel, M.A., 2005. Reconstruction of reticulate networks from gene trees, in: Proceedings of the Ninth International Conference on Research in Computational Molecular Biology, Springer, Heidelberg, pp. 233-249. Jansen, R.K., Cai, Z., Raubeson, L.A., Daniell, H., Depamphilis, C.W., Leebens-Mack, M.J., Muller, K.F., Guisinger-Bellian, M., Haberle, R.C., Hansen, A.K., 2007. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc. Natl. Acad. Sci. USA. 104, 19369-19374. Jerrold, I., Davis, D.W., Stevenson, Ole, G.P.S., Lisa, M., Campbell, J.V., Freudenstein, D.H., Goldman, C.R., Hardy, F.A., Michelangeli, M.P., Simmons, C.D., Specht, F.V.S., Maria, G., 2004. A phylogeny of the monocots, as inferred from rbcL and atpA sequence variation, and a comparison of methods for calculating jackknife and bootstrap values. Syst. Bot. 29, 467-510. Joly, S., McLenachan, P.A., Lockhart, P.J., 2009. A statistical approach for distinguishing hybridization and incomplete lineage sorting. Am. Nat. 174, E54-E70. Jones, D.T., Taylor, W.R., Thornton, J.M., 1992. The rapid generation of mutation data matrices from protein sequences. CABIOS. 8, 275-282. Judd, W.S., Olmstead, R.G., 2004. A survey of tricolpate eudicot phylogenetic relationships. Am. J. Bot. 91, 1627-1644. Junier, T., Zdobnov, E.M., 2010. The Newick utilities: high-throughput phylogenetic tree processing in the UNIX shell. Bioinformatics 26, 1669-1670. Katoh, K., Kuma, K.I., Toh, H., Miyata, T., 2005. MAFFT version 5: improvement in 25 / 40
621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664
accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511-518. Lee, E.K., Cibrian-Jaramillo, A., Kolokotronis, S.O., Katari, M.S., Stamatakis, A., Ott, M., Chiu, J.C., Little, D.P., Stevenson, D.W., McCombie, W.R., Martienssen, R.A., Coruzzi, G., DeSalle, R., 2011. A functional phylogenomic view of the seed plants. PLoS Genetics 7, e1002411. Linder, C.R., Rieseberg, L.H., 2004. Reconstructing patterns of reticulate evolution in plants. Am. J. Bot. 9, 1700-1708. Liu, L., Yu, L., Pearl, D.K., Edwards, S.V., 2009. Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58, 468-477. Liu, L., Yu, L., Edwards, S.V., 2010. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10, 302. Maddison, W.P., 1997. Gene trees in species trees. Syst. Biol. 46, 523-536. Maddison, W.P., Knowles, L.L., 2006. Inferring phylogeny despite incomplete lineage sorting. Syst. Biol. 55, 21-30. Masoud, S.A., Johnson, L.B., Sorensen, E.L., 1990. High transmission of paternal DNA in alfalfa plants demonstrated by restriction fragment polymorphic analysis. Theor. Appl. Genet. 79, 49-55. McCormack, J.E., Faircloth, B.C., Crawford, N.G., Gowaty, P.A., Brumfield, R.T., Glenn, T.C., 2012. Ultraconserved elements are novel phylogenetic markers that resolve placental mammal phylogeny when combined with species-tree analysis. Genome Res. 22, 746-754. Mogensen, H.L., 1996. The hows and whys of cytoplasmic inheritance in seed plants. Am. J. Bot. 83, 383-404. Moore, M.J., Soltis, P.S., Bell, C.D., Burleigh, J.G., Soltis, D.E., 2010. Phylogenetic analysis of 83 plastid genes further resolved the early diversification of eudicots. Proc. Natl. Acad. Sci. USA. 107, 4623-4628. Moore, M.J., Hassan, N., Gitzendanner, M.A., Bruenn, R.A., Croley, M., Vandeventer, A., Horn, J.W., Dhingra, A., Brockington, S.F., Latvis, M., Ramdial, J., Alexandre, R., Piedrahita, A., Xi, Z., Davis, C.C., Soltis, P.S., Soltis, D.E., 2011. Phylogenetic analysis of the plastid inverted repeat for 244 species: insights into deeper-level angiosperm relationships from a long, slowly evolving sequence region. Int. J. Plant Sci. 172, 541-558. Morton, M.C., 2011. Newly Sequenced Nuclear Gene (Xdh) for Inferring Angiosperm Phylogeny. Ann. Mo. Bot. Gard. 98, 63-89. Okuyama, Y., Fujii, N., Wakabayashi, M., Kawakita, A., Ito, M., Watanabe, M., Murakami, N., Kato, M., 2005. Nonuniform concerted evolution and chloroplast capture: heterogeneity of observed introgression patterns in three molecular data partition phylogenies of Asian Mitella (Saxifragaceae). Mol. Biol. Evol. 22, 285-296. Oliver, J.C., 2013. Microevolutionary processes generate phylogenomic discordance at ancient divergences. Evolution 67, 1823-1830. Olmstead, R.G., Kim, K.J., Jansen, R.K., Wagstaff, S.J., 2000. The phylogeny of the Asteridae sensu lato based on chloroplast ndhF gene sequences. Mol. Phylogenet. Evol. 16, 96-112. 26 / 40
665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708
Page, R.D.M., Charleston, M.A., 1998. Trees within trees: phylogeny and historical associations. Trends Ecol. Evol. 13, 356-359. Philippe, H., Delsuc, F., Brinkmann, H., Lartillot, N., 2005. Phylogenomics. Annu. Rev. Ecol. Evol. Syst. 36, 541-562. Phillips, M.J., Penny, D., 2003. The root of the mammalian tree inferred from whole mitochondrial genomes. Mol. Phylogenet. Evol. 28, 171-185. Phillips, M.J., Delsuc, F., Penny, D., 2004. Genome-Scale Phylogeny and the Detection of Systematic Biases. Mol. Biol. Evol. 21, 1455-1458. Qiu, Y.L., Li, L.B., Wang, B., Xue, J.Y., Hendry, T.A., Li, R.Q., Brown, J.W., Liu, Y., Hudson, Y.H., Chen, Z.D., 2010. Angiosperm phylogeny inferred from sequences of four mitochondrial genes. J. Syst. Evol. 48, 391-425. Rajan, V., 2013., A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments. Mol. Biol. Evol. 30, 689-712. Rasmussen, M.D., Kellis, M., 2011. A Bayesian approach for fast and accurate gene tree reconstruction. Mol. Biol. Evol. 28, 273-290. Rasmussen, M.D., Kellis, M., 2012. Unified modeling of gene duplication, loss and coalescence using a locus tree. Genome Res. 22, 755-65. Regier, J.C., Zwick, A., 2011. Sources of signal in 62 protein-coding nuclear genes for higher-level phylogenetics of arthropods. PLoS ONE 6, e23408. Rieseberg, L.H., Soltis, D.E., 1991. Phylogenetic consequences of cytoplasmic gene flow in plants. Evol. Trends Plants 5, 65-84. Rieseberg, L.H., Desrochers, A.M., Youn, S.J., 1995. Interspecific pollen competition as a reproductive barrier between sympatric species of Helianthus (Asteraceae). Am. J. Bot. 82, 515-519. Rieseberg, L.H., Whitton, J., Linder, C.R., 1996a. Molecular marker incongruence in plant hybrid zones and phylogenetic trees. Acta Bot. Neerl. 45, 243-262. Rieseberg, L.H., Sinervo, B., Linder, C.R., Ungerer, M.C., Arias, D.M., 1996b. Role of gene interactions in hybrid speciation: evidence from ancient and experimental hybrids. Science 272, 741-744. Rosenberg, N.A., Nordberg, M., 2002. Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nature Rev. 3, 380-390. Ruhfel, B.R., Gitzendanner, M.A., Soltis, D.E., Soltis, P.S., Burleigh, J.G., 2014. From algae to angiosperms – inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes. BMC Evol. Biol. 14, 23. Saarela, J.M., Graham, S.W., 2010. Inference of phylogenetic relationships among the subfamilies of grasses Poaceae: Poales. using meso-scale sampling of the plastid genome. Botany 88, 65-84. Sang, T., Zhong, Y., 2000. Testing hybridization hypotheses based on incongruent gene trees. Syst. Biol. 49, 422-434. Savolainen, V., Fay, M.F., Albach, D.C., Backlund, A., van der Bank, M., Cameron, K.M., Johnson, S.A., Lledó, M.D., Pintaud, J.C., Powell, M., Sheahan, M.C., Soltis, D.E., Soltis, P.S., Weston, P., Whitten, W.M., Wurdack, K.J., Chase, M.W., 2000. Phylogeny of the eudicots: A nearly complete familial analysis based on rbcL gene sequences. Kew Bull. 55, 257-309. 27 / 40
709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752
Schumann, C.M., Hancock, J.F., 1989. Paternal inheritance of plastids in Medicago sativa. Theor. Appl. Genet. 78, 863-866. Shore, J.S., Mcqueen, K.L., Little, S.H., 1994. Inheritance of plastid DNA in the Turnera ulmifolia complex (Turneraceae). Am. J. Bot. 81, 1636-1639. Shore, J.S., Triassi, M., 1998. Paternally biased cpDNA inheritance in Turnera ulmifolia (Turneraceae). Am. J. Bot. 85, 328-332. Shulaev, V., Sargent, D.J., Crowhurst, R.N., Mockler, T.C., Folkerts, O., Delcher, A.L., Jaiswal, P., Mockaitis, K., Liston, A., Mane, S.P., Burns, P., Davis, T.M., Slovin, J.P., Bassil, N., Hellens, R.P., Evans, C., Harkins, T., Kodira, C., Desany, B., Crasta, O.R., Jensen, R.V., Allan, A.C., Michael, T.P., Setuba, J.C., Celton, J., Rees, D.J.G., Williams, K.P., Holt, S.H., Rojas, J.J.R., Chatterjee, M., Liu, B., Silva, H., Meisel, L., Adato, A., Filichkin, S.A., Troggio, M., Viola, R., Ashman, T., Wang, H., Dharmawardhana, P., Elser, J., Raja, R., Priest, H.D., Bryant, Jr.D.W., Fox, S.E., Givan, S.A., Wilhelm, L.J., Naithani, S., Christoffels, A., Salama, D.Y., Carter, J., Girona, E.L., Zdepski, A., Wang, W., Kerstetter, R.A., Schwab, W., Korban, S.S., Davik, J., Monfort, A., Denoyes-Rothan, B., Arus, P., Mittler, R., Flinn, B., Aharoni, A., Bennetzen, J.L., Salzberg, S.L., Dickerman, A.W., Velasco, R., Borodovsky, M., Veilleux, R.E., Folta, K.M., 2010. The genome of woodland strawberry (Fragaria vesca). Nat. Genet. 43, 109-116. Simon, S., Narechania, A., DeSalle, R., Hadrys, H., 2012. Insect phylogenomics: Exploring the source of incongruence using new transcriptomic data. Genome Biol. Evol. 4, 1295-1309. Smith, S.A., Wilson, N.G., Goetz, F.E., Feehery, C., Andrade, S.C.S., Rouse, Giribet, G.W.G., Dunn, C.W., 2011. Resolving the evolutionary relationships of molluscs with phylogenomic tools. Nature 480, 364-367. Soltis, D.E., Soltis, P.S., Chase, M.W., 1999. Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402, 402-404. Soltis, D.E., Kuzoff, R.K., 1995. Discordance between nuclear and chloroplast phylogenies in the Heuchera group (Saxifragaceae). Evolution 49, 727-742. Soltis, D.E., Soltis, P.S., Chase, M.W., Mort, M.E., Albach, D.C., Zanis, M., Savolaninen, Hahn, V.W.H., Hoot, S.B., Fay, M.F., Axtell, M., Swensen, S.M., Prince, L.M., Kress, W.J., Nixon, K.C., Farris, J.S., 2000. Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences. Bot. J. Linn. Soc. 133, 381-461. Soltis, D.E., Soltis, P.S., Endress, P.K., Chase, M.W., 2005. Phylogeny and evolution of angiosperms. Sinauer Associates, Sunderland. Soltis, D.E., Gitzendanner, M.A., Soltis, P.S., 2007. A 567-taxon data set for angiosperms: The challenges posed by Bayesian analyses of large data sets. Int. J. Plant Sci. 168, 137-157. Soltis, D.E., Moore, M.J., Burleigh, J.G., Bell, C.D., Soltis, P.S., 2009. Molecular markers and concepts of plant evolutionary relationships: progress, promise, and future prospects. CRC Crit. Rev. Plant Sci. 28, 1-15. Soltis, D.E., Soltis, P.S., 2009. The role of hybridization in plant speciation. Annu. 28 / 40
753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796
Rev. Plant. Biol. 60, 561-588. Soltis, D.E., Smith, S.A., Cellinese, N., Wurdack, K.J., Tank, D.C., Brockington, S.F., Refulio-Rodriguez, N.F., Walker, J.B., Moore, M.J., Carlsward, B.S., et al., 2011. Angiosperm phylogeny: 17 genes, 640 taxa. Am. J. Bot. 98, 704-730. Song, S., Liu, L., Edwards, S.V., Wu, S., 2012. Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc. Natl. Acad. Sci. USA. 109, 14942-14947. Stamatakis, A., Hoover, P., Rougemont, J., 2008. A Rapid bootstrap algorithm for the RAxML web servers. Syst. Biol. 57, 758-771. Testolin, R., Cipriani, G., 1997. Paternal inheritance of chloroplast DNA and maternal inheritance of mitochondrial DNA in the genus Actinidia. Theor. Appl. Genet. 94, 897-903. Tsitrone, A., Kirkpatrick, M., Levin, D.A., 2003. A model for chloroplast capture. Evolution 57, 1776-82. Wang, H.C., Moore, M.J., Soltis, P.S., Bell, C.D., Brockington, S.F., Alexandre, R., Davis, C.C., Latvis, M., Manchester, S.R., Soltis, D.E., 2009. Rosids radiation and the rapid rise of angiosperm-dominated forests. Proc. Natl. Acad. Sci. USA. 106, 3853-3858. Wendel, J.F., Schnabel, A., Seelanan, T., 1995. An unusual ribosomal DNA sequence from Gossypium gossypioides reveals ancient, cryptic, intergenomic introgression. Mol. Phylogenet. Evol. 4, 298-313. Wendel, J.F., Doyle, J.J., 1998. Phylogenetic incongruence: window into genome history and molecular evolution, in: Soltis, D.E., Soltis, P.S., Doyle, J.J. (Eds.), Molecular systematics of plants II: DNA sequencing. Kluwer Academic Publishers, Boston, pp. 256-296. Whitfield, J.B., Lockhart, P.J., 2007. Deciphering ancient rapid radiations. Trends Ecol. Evol. 22, 258-265. Xi, Z., Liu, L., Rest, J.S., Davis, C.C., 2014. Coalescent versus concatenation methods and the placement of Amborella sister to water lilies. Syst. Biol. 63, 919-934. Xiang, Q.Y.J., Manchester, S.R., Thomas, D.T., Zhang, W., Fan, C., 2005. Phylogeny, biogeography, and molecular dating of cornelian cherries (Cornus, Cornaceae): tracking Tertiary plant migration. Evolution 59, 1685-1700. Xu, J., 2005. The inheritance of organelle genes and genomes: patterns and mechanisms. Genome 48, 951-958. Yang, T.W., Yang, Y.A., Xiong, Z., 2000. Paternal inheritance of chloroplast DNA in interspecific hybrids in the genus Larrea (Zygophyllaceae). Am. J. Bot. 87, 1452-1458. Yoder, J.B., Briskine, R., Mudge, J., Farmer, A., Paape, T., Steele, K., Weiblen, G.D., Bharti, A.K., Zhou, P., May, G.D., Young, N.D., Tiffin, P., 2013. Phylogenetic signal variation in the genomes of Medicago (Fabaceae). Syst. Biol. 62, 424-438. Zeng, L., Zhang, Q., Sun, R., Kong, H., Zhang, N., Ma, H., 2014. Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early 29 / 40
797 798 799 800 801 802 803 804 805 806 807 808 809
divergence times. Nat. Commun. 5, doi:10.1038/ncomms5956. Zhang, N., Zeng, L.P., Shan, H.Y., Ma, H., 2012. Highly conserved low-copy nuclear genes as effective markers for phylogenetic analyses in angiosperms. New Phytol. 195, 923-937. Zhang, J.Q., Meng, S.Y., Allen, G.A., Wen, J., Rao, G.Y., 2014. Rapid radiation and dispersal out of the Qinghai-Tibetan Plateau of an alpine plant lineage Rhodiola (Crassulaceae). Mol. Phylogenet. Evol. 77, 147-158. Zhu, X.Y., Chase, M.W., Qiu, Y.L., Kong, H.Z., Dilcher, D.L., Li, J.H., Chen, Z.D., 2007. Mitochondrial matR sequences help to resolve deep phylogenetic relationships in rosids. BMC Evol. Biol. 7, 217. Zou, X.H., Zhang, F.M., Zhang, J.G., Zang, L.L., Tang, L., Wang, J., Sang, T., Ge, S 2008. Analysis of 142 genes resolves the rapid diversification of the rice genus. Genome Biol. 9, R49.
810
30 / 40
811
Figure Legends
812
Figure 1: Majority-rule consensus tree of Maximum Likelihood (ML) bootstrap
813
analysis of 78-gene chloroplast matrix. These data place COM with Fabidae. To
814
highlight the position of COM, only the Fabidae, COM, and Malvidae clades were
815
labeled, and COM is isolated from the circumscription of Fabidae as previously
816
defined (Cantino et al., 2007). All the familial and ordinal names of the sampled taxa
817
follow APG III (2009). Numbers above branches are bootstrap percentages (BS).
818
Figure 2: Majority-rule consensus tree of Maximum Likelihood (ML) bootstrap
819
analysis of 4-gene mitochondrial matrix. These data place COM with Malvidae.
820
Figure 3: Majority-rule consensus tree of Maximum Likelihood (ML) bootstrap
821
analysis of 5-gene nuclear matrix. These data place COM with Malvidae.
822
Figure 4: Hypothetical reticulation scenario for the origin of COM from the ancestral
823
Fabidae and Malvidae lineages. Large circles reflect the plant lineages; the small
824
circles represent their nuclear DNA types (the red circle represents the Fabidae
825
nuclear DNA and the blue one the Malvidae nuclear DNA), the ovals represent the
826
chloroplast (the green ovals represent the Fabidae chloroplast, and the gray oval
827
represents the chloroplast from the Malvidae ancestor), and the diamonds represent
828
the mitochondria (the gray diamond represents the Fabidae mitochondrion and the
829
orange diamond the Malvidae mitochondrion); dashed arrows represent multiple
830
generations of backcrossing. During hybridization, the mitochondrion is maternally
831
inherited from the Malvidae ancestor, and the chloroplast is paternally inherited from 31 / 40
832
the Fabidae ancestor. After subsequent F1 backcrosses to the Malvidae, the resulting
833
generations contain chloroplasts from Fabidae, mitochondria from Malvidae, and a
834
majority of the nuclear genes from Malvidae, with a smaller number from Fabidae
835
(roughly 25%). The reticulate phylogeny at the bottom illustrates this hypothetical
836
introgressive hybridization scenario and shows the phylogenetic incongruence among
837
the three genomes with respect to COM.
838
32 / 40
839
Table 1. Summary of the placement of COM in previous phylogenetic studies.
Genome type
a
Relationship
Method of analysis/Support
Nr
Character-state weighting/–
COM + Fabidae COM + Fabidae COM + Fabidae
b
Parsimony/52% JK; BI/1.0 PP Parsimony jackknifing/77% JK; BI/1.0 PP ML/100% BS; MP/79% BS; BI/1.0 PP
Chloroplast
499
–
Chase et al., 1993
matK
374
16
Hilu et al., 2003
rbcL, atpB, 18S rDNA
560
64
81 cp
64
3
Jansen et al., 2007
567
59
Burleigh et al., 2009
rbcL, atpB, matK,
Soltis et al., 2000; Soltis et al.,2007
COM + Fabidae
ML/100% BS
10 cp, 2 nu
117
33
Wang et al., 2009
COM + Fabidae
ML/53% BS
83 cp
86
5
Moore et al., 2010
IR
244
14
Moore et al., 2011
11 cp, 2 nu, 4 mt
640
154
Soltis et al., 2011
78 cp
360
9
Ruhfel et al., 2014
COM + Fabida
33 / 40
rbcL
References
ML/89% BS
COM + Fabidae
Nuclear
COM sampling
c
COM + Fabidae
COM + Fabidae
Mitochondrial
Taxa Number
Marker
ML/99% BS (244 taxa); ML/89% BS (87 taxa) ML/57% BS ML/81% BS, 70% BS, 82% BS, 69% BS (ntAll, ntNo3rd, RY, AA)
18S rDNA, 26S rDNA
Malpighiales-F
ML/100% BS; BI/1.0 PP
45 cp
41
3
Xi et al., 2014
COM + Malvidae
ML/54% BS; MP/–
matR
174
21
Zhu et al., 2007
COM + Malvidae
ML/99% BS
atp1, matR, nad5, rps3
380
26
Qiu et al., 2010
Nr
–
18S rDNA
233
/
Oxalidales-M
ML/55% BS
Xdh
247
19
Morton 2011
COM + Malvidae
ML/>95% BS; BI/1.0 PP
SMC1, SMC2, MCM5,
94
5
Zhang et al., 2012
Soltis et al., 1997
MLH1, MSH1 Malpighiales-M COM + Malvidae Malpighiales-M
840
Note:
841
a
GTP-ML/18% BS (136 taxa); GTP-ML/75% BS (54 taxa) ML/>95% BS; MP/≤65% BS STAR/100% BP; MP-EST/100% BP; ML/100% BP; PhyloBayes/1.0 PP
18,896 gene trees
136
15
Burleigh et al., 2010
nuclear genome
101
7
Lee at al. 2011
310 nu
46
3
Xi et al., 2014
Nr = not resolved; COM + Fabidae = COM clade was placed in Fabidae; Malpighiales-F = only one member of COM Malpighiales, was
842
sampled and placed in Fabidae; COM + Malvidae = COM clade sister to Malvidae; Oxalidales-M = only Oxalidales sister to Malvidae;
843
Malpighiales-M = only Malpighiales sister to Malvidae.
844
b
JK = Jackknife value; BI = Bayesian inference; BS = Bootstrap value; BP = Bootstrap percentage; PP = Posterior probabilities; GTP = Gene
845
tree parsimony; STAR = a coalescent method: Species Tree Estimation using Average Ranks of Coalescence (Liu et al., 2009); MP-EST = a
846
coalescent method: Maximum Pseudo-likelihood for Estimating Species Trees (Liu et al., 2010); PhyloBayes = a Bayesian Monte Carlo
847
Markov Chain (MCMC) sampler for phylogenetic reconstruction using infinite mixtures
848
(http://megasun.bch.umontreal.ca/People/lartillot/www/downloadmpi.html).
849
c
81 cp = 81 chloroplast genes (Jansen et al., 2007); 10 cp, 2 nu = 10 chloroplast genes, rbcL, atpB, matK, psbBTNH region (4 genes), rpoC2,
34 / 40
850
ndhF, and rps4, and two nuclear genes, 18S rDNA and 26S rDNA; 83 cp = 83 chloroplast genes (Moore et al., 2010); IR means 25,000-bp
851
plastid Inverted Repeat region; 11 cp, 2 nu, 4 mt = 11 chloroplast genes, rbcL, atpB, matK, psbBTNH region (4 genes), rpoC2, ndhF, rps4
852
and rps16, and two nuclear genes, 18S rDNA and 26S rDNA, and four mitochondrial genes, atp1, matR, nad5, and rps3; ntAll, ntNo3rd, RY,
853
AA = four different character-coding matrix; ntAll = all nucleotide positions analysis; ntNo3rd = the first and second codon positions
854
analysis; RY = RY-coded analysis; AA = the amino acid analysis; 78 cp = chloroplast genes (Ruhfel et al., 2014); 45 cp = chloroplast genes
855
selected from plastid genome (See Supplementary Table S2 in Xi et al., 2014); 310 nu = low-copy nuclear genes selected from nuclear
856
genome and transcripts (See Supplementary Table S1 in Xi et al., 2014).
35 / 40
857
Table 2. Results from the single-copy nuclear gene analysis based on the ortholog
858
alignments from Lee et al. (2011). % BS
Fabidae
Malvidae
Fabidae + Malvidae
50
115 (36%) 208 (64%)
0
60
68 (34%)
131 (66%)
0
70
49 (36%)
89 (64%)
0
80
29 (38%)
48 (62%)
0
90
15 (35%)
28 (65%)
0
100
3 (25%)
9 (75%)
0
859
Note:
860
The columns list the number of orthologs (out of 8,445) and the percentage (in
861
parentheses) of all the informative genes that support a ‗COM + Fabidae‘ topology
862
(―Fabidae‖ column), a ‗COM + Malvidae‘ topology (―Malvidae‖ column), or COM
863
outside Fabidae + Malvidae (―Fabidae + Malvidae‖ column) with at least 50%
864
bootstrap (BS) support.
865
36 / 40
866
Table 3. Results from the multi-copy gene tree analysis based on genome
867
sequences of 22 land plant taxa.
868
Gene Duplications % BS Fabidae
869
38 (3%) 973 (91%)
62 (6%)
60
17 (2%) 723 (94%)
29 (4%)
70
12 (2%) 515 (96%)
11 (2%)
80
4 (1%)
308 (98%)
3 (1%)
90
2 (1%)
155 (98%)
1 (1%)
100
1 (6%)
15 (94%)
0 (0%)
Gene Duplications and Losses Fabidae
Malvidae
Fabidae + Malvidae
50
446 (20%) 1718 (76%)
82 (4%)
60
267 (16%) 1390 (81%)
47 (3%)
70
149 (12%) 1049 (86%)
26 (2%)
80
72 (9%)
715 (90%)
11 (1%)
90
30 (8%)
371 (91%)
5 (1%)
100
7 (11%)
54 (89%)
0 (0%)
Malvidae
Fabidae + Malvidae
Deep Coalescence % BS
871
Fabidae + Malvidae
50
% BS
870
Malvidae
Fabidae
50
547 (27%) 1468 (71%)
40 (2%)
60
358 (23%) 1160 (75%)
25 (2%)
70
229 (20%)
884 (79%)
13 (1%)
80
114 (16%)
598 (83%)
8 (1%)
90
58 (16%)
299 (83%)
4 (1%)
100
7 (13%)
44 (85%)
1 (2%)
Note: 37 / 40
872
We calculated the reconciliation cost based on the minimum number of implied gene
873
duplications, duplications + losses, and deep coalescence events, for 100 ML
874
bootstrap gene trees made from 3,784 multi-copy gene alignments. In the table, we
875
list the number and percentage (in parentheses) of genes (out of 3,784) that have at
876
least 50% bootstrap (BS) support for the three possible topologies based on the
877
reconciliation cost, respectively.
878
38 / 40
879
Supplementary materials
880
Table S1. Individual gene analyses for COM clade from 78-gene chloroplast, 4-gene
881
mitochondrial and 5-gene nuclear data sets.
882
Figure S1. RY coding tree from Maximum Likelihood (ML) bootstrap analysis of
883
78-gene chloroplast data set.
884
Figure S2. AA tree from Maximum Likelihood (ML) bootstrap analysis of 78-gene
885
chloroplast data set.
886
Figure S3. Best tree from Maximum Likelihood (ML) bootstrap analysis based on the
887
first and second codon positions only of 78-gene chloroplast data set.
888
Figure S4. AA tree from Maximum Likelihood (ML) bootstrap analysis of 4-gene
889
mitochondrial data set.
890
Figure S5. Best tree from Maximum Likelihood (ML) bootstrap analysis based on the
891
first and second codon positions only of 4-gene mitochondrial data set.
892
Figure S6. RY coding tree from Maximum Likelihood (ML) bootstrap analysis of
893
5-gene nuclear data set.
894
Figure S7. AA tree from Maximum Likelihood (ML) bootstrap analysis of 5-gene
895
nuclear data set.
39 / 40
896
Figure S8. Best tree from Maximum Likelihood (ML) bootstrap analysis based on the
897
first and second codon positions only of 5-gene nuclear data set.
40 / 40
100
64
100 100
100
52 81 100
100 100
100
100
100
100 100
99 100 100
95 73
100 100 100 100
100
100 62
100
100
100
59 100
100
100
100
100 100 100 100
100
100
100 100
100
100 100 100
55
67
100
100
100 100 100
98 72
100
100
95 59
100
100
100 100
100 100
100
100
98 92
53 100 100 100 100
100
Malvidae
100
88
COM
Rosidae
Bulnesia arborea Quercus nigra Glycine max Cucumis melo Ficus sp Morus indica Pentactina rupicola Euonymus americanus Oxalis latifolia Populus alba Passiflora biflora Jatropha curcas Ricinus communis Manihot esculenta Arabidopsis thaliana Carica papaya Gossypium arboreum Citrus sinensis Staphylea colchica Eucalyptus globulus Oenothera parviflora Pelargonium x Vitis vinifera Liquidambar styraciflua Heuchera sanguinea Sesamum indicum Epifagus virginiana Olea woodiana Solanum lycopersicum Ipomoea purpurea Coffea arabica Nerium oleander Aucuba japonica Lonicera japonica Daucus carota Eleutherococcus senticosus Panax ginseng Lactuca sativa Ilex cornuta Arbutus unedo Rhododendron simsii Franklinia alatamaha Cornus florida Davidia involucrata Silene vulgaris Anredera baselloides Fagopyrum esculentum Gunnera manicata Buxus microphylla Platanus occidentalis Meliosma aff cuneifolia Ranunculus macranthus Megaleranthis saniculifolia Nandina domestica Ceratophyllum demersum Saccharum hybrid Sorghum bicolor Oryza nivara Brachypodium distachyon Renealmia alpinia Musa acuminata Phoenix dactylifera Yucca schidigera Asparagus officinalis Iris virginica Lilium superbum Dioscorea elephantipes Colocasia esculenta Lemna minor Acorus americanus Piper cenocladum Calycanthus floridus Magnolia kwangsiensis Liriodendron tulipifera Chloranthus spicatus Illicium oligandrum Nuphar advena Nymphaea alba Amborella trichopoda Picea morrisonicola Pinus taeda Cycas taitungensis
Fabidae
64
27 63 97
71 33 73 83 94
100 99
53
94 97 100
86
55
61
36 28
100
13 86
96 100
100
47 100
100 55
27
95
38 60 83
22
77
100
100 68
98
34
90
100
92
71 100 100 100
59
98 67 45
100 61 60 27
99
31 74
53 88 100 54
88
100
44 22 100
25 100
96 100 100 100 100
Malvidae
Rosidae
Quercus Cucurbita Medicago Zelkova Morus Spiraea Euonymus Oxalis Euphorbia Hypericum Arabidopsis Carica Gossypium Citrus Schinus Cupaniopsis Stachyurus Guaiacum Oenothera Phytolacca Polygonum Vitis Corylopsis Cercidiphyllum Paeonia Lamium Scrophularia Syringa Nerium Galium Nicotiana Ipomoea Ilex Garrya Lonicera Helianthus Apium Hedera Pittosporum Hydrangea Cornus Actinidia Arbutus Diospyros Gunnera Pachysandra Buxus Meliosma Platanus Ranunculus Glaucidium Nandina Dicentra Ceratophyllum Chloranthus Oryza Maranta Strelitzia Chamaedorea Asparagus Iris Dioscorea Lilium Alisma Spathiphyllum Laurus Calycanthus Magnolia Liriodendron Houttuynia Thottea Aristolochia Schisandra Illicium Nuphar Brasenia Amborella Pinus Zamia
Fabidae COM
32
100
94 80
47
100
100 71 100
Rosidae
100
100
100
90 68 100
81
96
100
62 100 37 100
64
100 100
100
96 100
100
47 63
100
100
100
100
85
74
100 100 73
100 100
50
100 100
100
100
100 99
100
100 100 76
100
100
100
100 100 100 77
76 79 100
100
60 50 100
100
100
100 100
85
97
100 63
99 100
87 47
74
100 58
100
100 100 100
88
Malvidae
100
Cyclobalanopsis glauca Glycine max Morus alba Ulmus macrocarpa Photinia serrulata Cucumis sativus Euonymus carnosus Hypericum chinense Populus tricocarpa Ricinus communis Manihot esculenta Arabidopsis thaliana Arabidopsis lyrata Carica papaya Hibiscus syriacus Poncirus trifoliata Tetradium ruticarpum Rhus chinensis Sapindus mukorossi Stachyurus yunnanensis Oenothera erythrosepala Lagerstroemia limii Pelargonium hortorum Stellaria media Phytolacca americana Polygonum runcinatum Distylium buxifolium Cercidiphyllum japonicum Paeonia lactiflora Vitis vinifera Mimulus guttatus Callicarpa bodinieri Jassminum nudiflorum Solanum lycopersicum Pharbitis nil Nerium oleander Vinca major Galium aparine Ilex purpurea Aucuba japonica Lactuca sativa Lonicera japonica Hedera nepalensis Ligusticum chuanxiong Pittosporum tobira Rhododendron pulchrum Actinidia arguta Diospyros kaki Philadelphus incanus Cornus officinalis Cornus wisoniana Gunnera manicata Buxus sinica Pachysandra terminalis Platanus acerifolia Meliosma parviflora Ranunculus muricatus Aquilegia coerulea Nandina domestica Dicentra spectabilis Sorghum bicolor Setaria italica Brachypodium distachyon Oryza sativa Canna indica Musa basjoo Trachycarpus fortunei Dioscorea opposita Yucca filamentosa Asparagus officinalis Iris japonica Lilium brownii Alisma plantago-aquatica Pinellia ternata Acorus calamus Cinnamomum camphora Chimonanthus praecox Magnolia denudata Liriodendron Houttuynia cordata Asarum heterotropoides Aristolochia fimbriata Ceratophyllum taobao Chloranthus Schisandra Illicium henryi Brasenia schreber Nuphar advena Amborella trichopoda Pinus Picea Zamia fischeri
Fabidae COM
78 86
♀ Malvidae ancestor
Fabidae ancestor ♂
× × COM ancestor
Fabidae
COM
Malvidae
Nuclear gene tree Chloroplast gene tree Mitochondrial gene tree
Highlights • Report phylogenetic conflict of COM in chloroplast, mitochondrial, and nuclear data. • Results of multi-gene and genomic data show strong evidence for deep incongruence. • We provide an example for examination of other deep nodes of the tree of life. • Genomic datasets highlight patterns of deep incongruence in angiosperm phylogeny. • Stress the complexity of angiosperm evolution, which may be masked by a few genes.
Fabidae ancestor ♂
×
Fabidae chloroplast DNA Fabidae nuclear DNA Malvidae nuclear DNA Malvidae mitochodrial DNA
Fabidae
×
COM
Malvidae
Nuclear gene tree
COM ancestor
Chloroplast gene tree Mitochondrial gene tree
Malvidae ancestor ♀