Accepted Manuscript Deep phylogenetic incongruence in the angiosperm clade Rosidae Miao Sun, Douglas E. Soltis, Pamela S. Soltis, Xinyu Zhu, J. Gordon Burleigh, Zhiduan Chen PII: DOI: Reference:

S1055-7903(14)00387-X http://dx.doi.org/10.1016/j.ympev.2014.11.003 YMPEV 5067

To appear in:

Molecular Phylogenetics and Evolution

Received Date: Revised Date: Accepted Date:

10 June 2014 1 November 2014 5 November 2014

Please cite this article as: Sun, M., Soltis, D.E., Soltis, P.S., Zhu, X., Gordon Burleigh, J., Chen, Z., Deep phylogenetic incongruence in the angiosperm clade Rosidae, Molecular Phylogenetics and Evolution (2014), doi: http://dx.doi.org/10.1016/j.ympev.2014.11.003

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

Deep phylogenetic incongruence in the angiosperm clade Rosidae

2 3

Miao Suna, b, Douglas E. Soltis*, c, d, e, Pamela S. Soltisd, e, Xinyu Zhuf, J. Gordon

4

Burleighc, e, Zhiduan Chen*, a

5 6

a

7

the Chinese Academy of Sciences, Beijing 100093, China;

8

b

Graduate University of the Chinese Academy of Sciences, Beijing 100039, China;

9

c

Department of Biology, University of Florida, Gainesville, FL 32611, USA;

10

d

Florida Museum of Natural History, University of Florida, Gainesville, FL 32611,

11

USA;

12

e

University of Florida Genetics Institute

13

f

School of Life Science, Nantong University, Nantong 226007, China;

State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany,

14 15

*Corresponding authors:

16

Zhiduan Chen: [email protected], 8610-62836090

17

Douglas E. Soltis: [email protected], (1)-352-273-1963

18

Running title: Deep phylogenetic incongruence in Rosidae

19

Data Archival Location: Dryad (XXXX)

20

1 / 40

21

Abstract

22

Analysis of large data sets can help resolve difficult nodes in the tree of life and also

23

reveal complex evolutionary histories. The placement of the

24

Celastrales-Oxalidales-Malpighiales (COM) clade within Rosidae remains one of the

25

most confounding phylogenetic questions in angiosperms, with previous analyses

26

placing it with either Fabidae or Malvidae. To elucidate the position of COM, we

27

assembled multi-gene matrices of chloroplast, mitochondrial, and nuclear sequences,

28

as well as large single- and multi-copy nuclear gene data sets. Analyses of multi-gene

29

data sets demonstrate conflict between the chloroplast and both nuclear and

30

mitochondrial data sets, and the results are robust to various character-coding and

31

data-exclusion treatments. Analyses of single- and multi-copy nuclear loci indicate

32

that most loci support the placement of COM with Malvidae, fewer loci support COM

33

with Fabidae, and almost no loci support COM outside a clade of Fabidae and

34

Malvidae. Although incomplete lineage sorting and ancient introgressive

35

hybridization remain as plausible explanations for the conflict among loci, more

36

complete sampling is necessary to evaluate these hypotheses fully. Our results

37

emphasize the importance of genomic data sets for revealing deep incongruence and

38

complex patterns of evolution.

39

Keywords

40

Hybridization; introgression; incomplete lineage sorting; COM clade; incongruence;

41

phylogenomics

42

2 / 40

43

1. Introduction

44

Genome-scale data can provide the power to resolve some of the most perplexing

45

parts of the tree of life (e.g., Dunn et al., 2008; Lee et al., 2011; Simon et al., 2012;

46

Smith et al., 2011; Yoder et al., 2013). Furthermore, estimates from numerous

47

independent loci also can reveal phylogenetic incongruence caused by different

48

evolutionary processes, such as gene duplication and loss, recombination,

49

hybridization, lateral gene transfer, or incomplete lineage sorting (e.g., Cui et al., 2013;

50

Degnan and Rosenberg, 2009; Doyle, 1992; Goodman et al., 1979; Hudson, 1983;

51

Maddison, 1997; Oliver, 2013). Molecular phylogenetic analyses have resolved much

52

of the backbone angiosperm phylogeny (e.g., Ruhfel et al., 2014; Soltis et al., 2009,

53

2011) and clarified long-standing questions regarding relationships within major

54

clades such as monocots (Monocotyledoneae; Chase et al., 2000; Givnish et al., 2006,

55

2010; Graham et al., 2006; Jerrold et al., 2004; Saarela et al., 2008), asterids

56

(Asteridae; Albach et al., 2001; Bremer et al., 2001, 2004; Hilu et al., 2003; Moore et

57

al., 2011; Olmstead et al., 2000), and rosids (Rosidae; Hilu et al., 2003; Jansen et al.,

58

2007; Moore et al., 2010; Qiu et al., 2010; Soltis et al., 2007, 2011; Wang et al., 2009).

59

Yet much of this work is based either largely or exclusively on chloroplast sequence

60

data, which represent a single, linked, and usually maternally inherited locus. New

61

sequencing technologies make it feasible to obtain data sets of numerous independent

62

nuclear loci, which can be used to evaluate results from analyses of chloroplast gene

63

sequence data and reveal phylogenetic conflict among loci (e.g., Burleigh et al., 2011; 3 / 40

64 65

Duarte et al., 2010; Lee et al., 2011; Xi et al., 2014; Zeng et al., 2014). Introgressive hybridization has played an important role in plant evolution, and

66

incomplete lineage sorting also likely occurred during some rapid radiations.

67

Consequently, there are numerous examples of discordance between chloroplast and

68

nuclear gene trees in plants (e.g., Acosta and Premoli, 2010; Okuyama et al., 2005;

69

Rieseberg and Soltis, 1991; Rieseberg and Wendel, 1993; Rieseberg et al., 1995,

70

1996a; Soltis and Kuzoff, 1995; Soltis and Soltis, 2009; Tsitrone et al., 2003; Wendel

71

et al., 1995; Xi et al., 2014). Although phylogenetic analyses of angiosperm backbone

72

relationships based on nuclear, mitochondrial, and chloroplast loci have largely agreed,

73

one major point of conflict is the placement of COM

74

(Celastrales-Oxalidales-Malpighiales; Endress and Matthews, 2006; Zhu et al., 2007)

75

within the large Rosidae clade.

76

Rosidae comprise approximately one quarter of all angiosperm species, which

77

are morphologically diverse, exhibit extraordinary heterogeneity in habit, habitat, and

78

life form, and include most temperate and tropical forest trees (Wang et al., 2009).

79

Some members possess novel biochemical pathways (e.g., production of glucosinolate,

80

and cyanogenic glycosides for defense), and many are important crops (e.g., Fabaceae

81

and Rosaceae). Symbioses with nitrogen-fixing bacteria are largely confined to this

82

clade as well. Resolving relationships within Rosidae has been difficult (e.g., Hilu et

83

al., 2003; Jansen et al., 2007; Lee et al., 2011; Moore et al., 2010, 2011; Qiu et al.,

84

2010; Ruhfel et al., 2014; Soltis et al., 2005, 2007, 2011; Wang et al., 2009; Zhu et al.,

85

2007) due to a series of rapid radiations (Wang et al., 2009). However, multi-gene 4 / 40

86

studies have recovered two major, well-supported clades — the Fabidae (i.e., eurosids

87

I, fabids) and Malvidae (i.e., eurosids II, malvids) (Hilu et al., 2003; Judd and

88

Olmstead, 2004; Moore et al., 2010, 2011; Soltis et al., 1999, 2000, 2005, 2007, 2011;

89

Wang et al., 2009; Xi et al., 2014).

90

COM contains approximately one third of all Rosidae, 870 genera and ~19,000

91

species (APG III, 2009). Molecular analyses, largely dominated by chloroplast genes,

92

have usually placed COM with Fabidae (Table 1; e.g., Burleigh et al., 2009; Hilu et

93

al., 2003; Jansen et al., 2007; Moore et al., 2010, 2011; Soltis et al., 2005, 2007, 2011;

94

Wang et al., 2009). Analyses of the mitochondrial gene matR first suggested the

95

placement of COM with Malvidae (Zhu et al., 2007), and subsequent studies based on

96

nuclear or mitochondrial genes supported this placement, although typically with

97

limited taxon sampling (Table 1; Burleigh et al., 2011; Duarte et al., 2010; Finet et al.,

98

2010; Lee et al., 2011; Morton, 2011; Qiu et al., 2010; Shulaev et al., 2010; Xi et al.,

99

2014; Zhang et al., 2012). Several floral characters also appear to link COM with

100

Malvidae. For example, in COM and Malvidae species, the inner integument of the

101

ovule is thicker than the outer integument at the time of fertilization, a feature that is

102

extremely rare in Fabidae and other eudicots. Additionally, contorted petals and a

103

tendency towards polystemony and polycarpy also suggest a placement of COM

104

members with Malvidae rather than with Fabidae (Endress and Matthews, 2006;

105

Endress et al., 2013).

106 107

Although analyses of chloroplast gene sequence data generally appear to conflict with analyses of mitochondrial and nuclear gene sequence data, these studies often 5 / 40

108

differ greatly in taxon sampling and analytical methods (Table 1; but see Xi et al.,

109

2014). Thus, it is unclear whether the different placements of COM are due to errors

110

in the analyses or biological incongruence among loci. The level of incongruence

111

within the nuclear genome also is unknown. We use COM as an exemplar to

112

investigate phylogenetic incongruence at deep levels in angiosperm phylogeny.

113

Specifically, we first compare phylogenetic results from chloroplast, mitochondrial,

114

and nuclear data sets having similar taxon sampling and examine whether the results

115

are robust to various character-coding and data-exclusion protocols. We also survey

116

large-scale nuclear data sets of both single-copy and multi-copy genes to investigate

117

the patterns of phylogenetic discordance within the nuclear genome and then discuss

118

whether these patterns are consistent with incomplete lineage sorting (i.e., deep

119

coalescence) (Maddison, 1997; Maddison and Knowles, 2006; Page and Charleston,

120

1998) or ancient hybridization and introgression (Chang et al., 2011; Cui et al., 2013;

121

Linder and Rieseberg, 2004; Tsitrone et al., 2003; Zhang et al., 2014).

122

2. Materials and methods

123

Throughout this paper, to facilitate discussion, we treat COM, Fabidae, and Malvidae

124

as three separate groups, despite current classifications that consider COM to be part

125

of Fabidae (APG III, 2009; Cantino et al., 2007).

126

2.1 Phylogenetic analyses of chloroplast, mitochondrial, and nuclear data

127

To compare the placement of COM in analyses of chloroplast, mitochondrial, and

128

nuclear gene data sets, we assembled published matrices with similar taxon sampling. 6 / 40

129

For the chloroplast gene sequence data, we pruned 82 seed plant taxa from the

130

78-gene chloroplast data set of Ruhfel et al. (2014). We also used the 92-taxon, 5-gene

131

nuclear data set of Zhang et al. (2012), and the 79-taxon, 4-gene mitochondrial matrix

132

of Qiu et al. (2010). The taxon sampling in all of these studies was designed to

133

reconstruct relationships across angiosperms using representative sampling of major

134

clades, including COM. We used the nuclear gene sequence data set of Zhang et al.

135

(2012) to guide our assembly of chloroplast and mitochondrial gene data sets,

136

attempting to ensure as much as possible that the taxa employed from these data sets

137

are from the same species or genus.

138

We performed a series of Maximum Likelihood (ML) phylogenetic analyses on

139

each of the three data sets using RAxML v.7.2.8 (Stamatakis, 2008). For all analyses,

140

we estimated the optimal ML tree and performed 100 nonparametric bootstrap (BS)

141

replicates. First, we analyzed the full nucleotide alignments using an unpartitioned

142

GTRCAT model. We also examined the variation of COM placement in single-gene

143

topologies inferred from these three data sets using RAxML with the GTRCAT model.

144

For the three multi-gene data sets, we also analyzed the amino acid (AA) alignment

145

using the PROTCATJTT model (Jones et al., 1992). RY coding, which recodes the

146

nucleotides as binary characters, either purines (A or G = R) or pyrimidines (C or T =

147

Y), has been used to ameliorate biases caused by saturation, rate heterogeneity, and

148

base composition (Delsuc et al., 2005; Gibson et al., 2005; Harrison et al., 2004;

149

Phillips et al., 2004; Phillips and Penny, 2003). Thus, we also transformed the three

150

full nucleotide matrices to RY coding and ran a ML analysis using the GTRCAT 7 / 40

151 152

model. The elimination of potentially misleading sites from an alignment is a common

153

practice in phylogenetic analysis (e.g., Delsuc et al., 2005; Philippe et al., 2005; Rajan,

154

2013; Regier and Zwick, 2011). We used two methods to remove highly variable sites

155

as a further means of exploring the data that may contribute to the discordant

156

placements of COM. First, following Goremykin et al. (2010), we organized the

157

nucleotide sites in each alignment in order of rate based on the observed variability

158

(OV) criterion. For each sorted alignment, we then removed the most variable 5%,

159

10%, 20%, 30%, 40%, and 50% of the sites. After each removal, we performed a ML

160

analysis on the remaining sites in each alignment using the GTRCAT model. Second,

161

we excluded the third codon positions and analyzed the alignments of the first and

162

second codon positions only using RAxML with the GTRCAT model.

163

2.2 Single-copy nuclear gene analysis

164

The largest nuclear gene data set used to resolve the backbone of angiosperm

165

relationships comprises 22,833 groups of orthologs (Lee et al., 2011). Although this

166

data set includes only seven species of Malpighiales representing COM (no

167

Celastrales or Oxalidales species were included), it provides estimates from by far the

168

greatest number of presumably independent nuclear loci for the placement of COM.

169

We examined the individual gene trees from this data set to look for variation in the

170

placement of COM. First, we divided the full, concatenated nucleotide alignment

171

from Lee et al. (2011; available on the BIGPLANT website: http://nybg.bio.nyu.edu/)

172

into separate alignments, each representing a set of putative orthologs. Next we 8 / 40

173

identified the ortholog sets that were potentially informative regarding the placement

174

of COM; these alignments contained at least one COM species, one Malvidae, one

175

Fabidae, and one other species not in any of these groups. In all, 8,445 of the ortholog

176

sets were potentially informative regarding the placement of COM. For each

177

potentially informative ortholog set alignment, we performed 100 ML bootstrap

178

replicates using RAxML v.7.2.8 with the GTRCAT model (Stamatakis, 2008), and we

179

counted how many bootstrap replicates support a clade of COM and Fabidae species,

180

how many support a clade of COM and Malvidae species, and how many support

181

COM outside a clade of Fabidae and Malvidae. The analysis of the support for the

182

COM placement was automated using Perl scripts and Newick utilities (Junier and

183

Zdobnov, 2010).

184

2.3 Multi-copy nuclear gene analysis

185

We also examined support for the placement of COM using multi-copy nuclear gene

186

families, i.e., gene families that may have multiple sequences from one or more taxa.

187

Unlike single-copy sets of orthologs, interpreting a phylogenetic tree supported by

188

multi-copy gene families is not always straightforward. For example, in a multi-copy

189

gene tree, a species from COM could have one sequence that groups with Fabidae

190

and one sequence that groups with Malvidae. To solve this problem, we estimated the

191

reconciliation cost of each gene family tree given a topology with COM sister to

192

Fabidae, COM sister to Malvidae, and COM sister to a Fabidae + Malvidae clade.

193

We used three different reconciliation costs, each implying a different evolutionary

194

scenario: 1) the minimum number of implied gene duplications; 2) the minimum 9 / 40

195

number of implied duplications and losses; and 3) the minimum number of implied

196

deep coalescence events (e.g., Maddison, 1997). We used a parsimony criterion to

197

distinguish among the three species tree topologies; the topology with the lowest

198

reconciliation cost, i.e., the topology that implies the fewest evolutionary events, is

199

the topology that is supported by the gene family. If two or three of the topologies

200

have equal reconciliation costs, the gene family is considered uninformative regarding

201

the placement of COM.

202

We assembled a collection of 3,748 gene family alignments obtained from the

203

genome sequences of 22 plant taxa with OrthoMCL (Chen et al., 2006) and aligned

204

with MAFFT (Katoh et al., 2005). Included are Selaginella moellendorffii,

205

Physcomitrella patens, and 20 angiosperm species, including one species representing

206

COM, Populus trichocarpa. Although the taxon sampling is sparse, using only

207

sequences from completely sequenced genomes may enable more accurate estimates

208

of processes such as gene loss than incomplete transcriptome data sets.

209

For each multi-copy gene alignment, we performed 100 bootstrap replicates

210

using RAxML v.7.2.8 with the GTRCAT model (Stamatakis, 2008). For each of the

211

resulting bootstrap trees, we calculated the reconciliation cost under the three different

212

cost models (duplications, duplications and losses, and deep coalescence) using a

213

species tree in which Populus (COM clade) was sister to Fabidae (Fragaria vesca,

214

Medicago trunculata, and Glycine max), one in which Populus was sister to Malvidae

215

(Arabidopsis thaliana, Thellungiella parvula, Carica papaya, and Theobroma cacao),

216

and one in which Populus was sister to Fabidae + Malvidae. The rooting of gene trees 10 / 40

217

can greatly affect the estimates of the reconciliation cost, and it is often difficult to

218

infer the root of a multi-copy gene tree. Thus, for each gene tree, we used a rooting

219

that minimized the reconciliation cost. We calculated the reconciliation costs for each

220

gene tree bootstrap replicate under three possible species trees using the program

221

OptRoot, written by Andre Wehe and available at http://www.wehe.us/optroot.html.

222

All data sets and results are available on Dryad (XXX; www.datadryad.org).

223

3. Results

224

3.1 Chloroplast, mitochondrial, and nuclear data sets

225

ML analyses of the chloroplast, mitochondrial, and nuclear multi-gene alignments

226

with similar taxon sampling recover different placements of COM (Figures 1–3). We

227

focus on the relationships among members of Rosidae, but all the trees generated in

228

our analyses in the present study are available as supplemental data and on Dryad

229

(XXXX).

230

The phylogeny based on the 82-taxon, 78-gene chloroplast data set largely agrees

231

with conclusions from previous chloroplast-dominated studies (APG III, 2009; Moore

232

et al., 2010; Ruhfel et al., 2014; Soltis et al., 2011; Wang et al., 2009), supporting the

233

placement of COM with Fabidae (Figure 1). COM received 100% BS support, as did

234

a clade of all COM and Fabidae species, but the precise placement of COM was

235

uncertain. There was 52% BS support for a sister relationship of COM and all

236

Fabidae except Bulnesia (Zygophyllales; Figure 1), which was sister to COM and

237

other Fabidae species. Although most chloroplast genes support a placement of COM 11 / 40

238

with Fabidae, albeit generally with low BS support, no chloroplast genes provide

239

even 50% BS support for COM with Malvidae (Table S1). The analysis of the full

240

chloroplast alignment with RY coding indicates 100% BS support for a clade of COM

241

and Fabidae species (Figure S1). AA coding indicates 47% BS support for a clade of

242

the COM species and all Fabidae except Bulnesia, which is placed in Malvidae,

243

although with low support (Figure S2). Removing the highly variable nucleotide sites

244

from the chloroplast alignment quickly erodes support for COM with Fabidae.

245

Bootstrap support for COM with Fabidae drops from 98% to 48% to 1% with the

246

removal of the 5%, 10%, and 20% most variable sites, respectively, with no support

247

after removing more sites. However, none of the site removal analyses indicates any

248

support for a clade of COM and Malvidae or COM outside of Fabidae + Malvidae.

249

Removing the third codon positions from the 78-gene chloroplast data set reduced the

250

BS support for a clade of COM and Fabidae species to 90%, with Bulnesia initially

251

sister to COM (Figure S3).

252

Trees from analyses of the 79-taxon, 4-gene mitochondrial data set generally

253

indicate a close relationship of COM with species from Malvidae (Figure 2). In the

254

ML analysis of the full nucleotide data set, there is 94% BS support for a clade of

255

COM species and all Malvidae except Stachyurus and Oenothera (Figure 2).

256

Additionally, Guaiacum (Zygophyllales, Fabidae) is sister to Stachyurus

257

(Crossosomatales, Malvidae) in agreement with Qiu et al. (2010). However, the

258

placement of Guaiacum differs from those obtained in studies based largely on

259

chloroplast genes (see APG III, 2009; Soltis et al., 2011). Analyses of the four 12 / 40

260

individual mitochondrial genes either show weak (< 60%) BS support linking COM

261

with Malvidae or yield trees that are unresolved, with little, if any, support for the

262

monophyly of either clade or Rosidae (Table S1). RY coding greatly reduces support

263

for relationships within Rosidae, with no support even for the monophyly of COM

264

(not shown). AA coding indicates 74% BS support for a clade of COM species and all

265

Malvidae except Stachyurus and Oenothera (Figure S4). Removing the most variable

266

5% of sites yields 100% BS support for a clade of COM species and all Malvidae

267

except Stachyurus and Oenothera. However, removing more variable sites greatly

268

reduces support for relationships throughout the tree; after removing the 10% most

269

variable sites, BS support for COM drops to 23%. After removing the third codon

270

position from the 4-gene mitochondrial data set, there is 31% BS support for a clade

271

of COM species and all Malvidae except Stachyurus and Oenothera (Figure S5).

272

The results from analyses of the 92-taxon, 5-gene nuclear data set provide 100%

273

BS support for a clade that includes COM species and all species of Malvidae except

274

Pelargonium, Oenothera, and Lagerstroemia (Figure 3), as do the results from ML

275

analyses of the RY and AA matrices (Figures S6, S7). This placement of COM with

276

Malvidae is also in agreement with the ML analyses of the five individual nuclear

277

genes, although with different levels of support (Table S1). Likewise, the ML analyses

278

of nucleotides after removing 5% and 10% of the most variable sites yield 100% BS

279

support for a clade that includes all of the COM species and Malvidae species, except

280

Pelargonium, Oenothera, and Lagerstroemia. Removing 20% or 30% of the most

281

variable sites reduces the support for this clade to 96% and 94%, respectively, but 13 / 40

282

removing more sites greatly reduces support for relationships within Rosidae in

283

general, including support for the monophyly of COM. Removing the third codon

284

position from the 5-gene nuclear data set still resulted in 100% BS support for a clade

285

of COM and all Malvidae species except Pelargonium, Oenothera, and Lagerstroemia

286

(Figure S8).

287

3.2 Single-copy nuclear gene analysis

288

Although most of the orthologous gene sets from the Lee et al. (2011) analysis were

289

not informative regarding the placement of COM, among those genes that do support

290

one of the three possible placements (COM with Fabidae, COM with Malvidae, or

291

COM outside of Fabidae + Malvidae), 62–75% support a clade of COM with

292

Malvidae, 25-38% support a clade of COM with Fabidae, and none of the

293

orthologous gene sets support COM outside of Fabidae + Malvidae (Table 2). While

294

increasing the minimum bootstrap support cutoff reduces the number of orthologous

295

gene sets supporting each hypothesis, it has relatively little effect on the percentage of

296

informative genes supporting COM with Malvidae versus COM with Fabidae (Table

297

2).

298

3.3 Multi-copy nuclear gene analysis

299

Similar to the single-copy nuclear gene analysis, most of the multi-copy gene trees

300

were not informative regarding the placement of COM, but the majority of

301

informative genes support the placement of COM with Malvidae. Between 71–98% of

302

the informative genes support a clade of COM and Malvidae species, depending on

303

the model of reconciliation and the minimum bootstrap support level (Table 3). The 14 / 40

304

duplication-only model provides the strongest support for a clade of COM with

305

Malvidae (≥ 91%; Table 3). The maximum percentage of informative genes

306

supporting COM with Fabidae is 27%, which is based on the deep coalescence

307

reconciliation model (Table 3). A clade of COM outside of Fabidae + Malvidae is

308

recovered by 0–6% of the genes in these analyses (Table 3).

309

4. Discussion

310

4.1 Conflict among multi-locus phylogenetic analyses

311

In spite of much recent progress resolving the angiosperm tree of life, the

312

phylogenetic placement of COM remains uncertain. Most previous efforts to place

313

COM have used a variety of data sources, taxon sampling strategies, and phylogenetic

314

methods (but see Xi et al., 2014). Therefore, it is difficult to determine if the

315

conflicting placements of COM are due to errors or actual biological conflict among

316

loci (Table 1). Our ML analyses of multi-gene chloroplast, mitochondrial, and nuclear

317

data sets with similar taxon sampling reinforce the observation that analyses of

318

chloroplast loci yield a topology that differs from analyses of mitochondrial and most

319

nuclear loci (Table 1; Figures 1–3). The isolated placements of Lagerstroemia,

320

Oenothera, Pelargonium, and Stachyurus make Malvidae non-monophyletic in some

321

of our analyses of mitochondrial and nuclear data sets; however, their positions in our

322

trees are consistent with those of Qiu et al. (2010) and Zhang et al. (2012),

323

respectively. Furthermore, these four genera are respectively from Myrtales

324

(Lagerstroemia, Lythraceae; Oenothera, Onagraceae), Geraniales (Pelargonium, 15 / 40

325

Geraniaceae), and Crossosomatales (Stachyurus, Stachyuraceae), the exact placement

326

of which within Rosidae varies among chloroplast, mitochondrial, and nuclear trees

327

(e.g., Morton, 2011; Qiu et al., 2010; Soltis et al., 2011; Xi et al. 2014; Zhang et al.,

328

2012; Zhu et al., 2007).

329

Our analyses of the chloroplast, mitochondrial, and nuclear data sets are robust to

330

different character-coding strategies, which are often used to detect heterogeneous

331

phylogenetic signals or error. AA matrices and RY coding are used to ameliorate

332

nucleotide saturation and composition biases (Delsuc et al., 2005; Gibson et al., 2005;

333

Harrison et al., 2004; Hashimoto et al., 1995; Phillips and Penny, 2003), and removal

334

of highly variable sites has been proposed to reduce long-branch attraction or

335

model-fitting error (see Philippe et al., 2005). Some of these experiments erode

336

phylogenetic signal for the placement of COM, but none support an alternative

337

placement of COM. Although we failed to find obvious signs of major systematic or

338

sampling biases, it is difficult to demonstrate the absence of error. In fact, the (weakly

339

supported) variation in single-gene topologies of linked chloroplast genes suggests

340

that some level of error may be present in chloroplast gene sequence analyses (Table

341

S1). Nonetheless, the consistency of the incongruence suggests that there may be an

342

underlying biological basis to the conflict among chloroplast, nuclear, and

343

mitochondrial loci.

344

4.2 Evolutionary patterns suggested by conflict among nuclear loci

345

If the conflict among chloroplast, nuclear, and mitochondrial gene sequence data is

346

due to evolutionary events such as ancient hybridization or incomplete lineage sorting, 16 / 40

347

we would also expect to see conflict among independent nuclear loci. Indeed, within

348

the single-copy nuclear gene data set from Lee et al. (2011), on average 66% of the

349

informative genes support a placement of COM with Malvidae, while on average 34%

350

weakly support the placement of COM with Fabidae (Table 2). The multi-copy genes

351

reveal similar levels of incongruence, with at least 71% of the informative genes

352

supporting COM with Malvidae, with far less support for COM with Fabidae and

353

very little support for COM outside a clade of Malvidae + Fabidae (Table 3). This

354

predominant placement of COM with Malvidae within multi-copy gene trees is

355

consistent with previous gene tree parsimony analyses (Burleigh et al., 2011; Górecki

356

et al., 2012). The placement of COM from multi-copy genes is robust to the model of

357

gene reconciliation (Table 3). Furthermore, in both the single- and multi-copy gene

358

results, the overall percentage of informative genes supporting each of the three

359

hypotheses is relatively stable no matter the bootstrap cutoff we use (Tables 2, 3).

360

If the differences in the position of COM among nuclear loci are not due to

361

errors, they may reflect biological processes such as ancient hybridization and/or

362

incomplete lineage sorting. Distinguishing between incomplete lineage sorting and

363

hybridization can be challenging (e.g., Buckley et al., 2006; Joly et al., 2009; Holder

364

et al., 2001; Holland et al., 2008; Sang and Zhong, 2000), and the sparse and

365

incomplete taxon sampling within our nuclear gene data sets, as well as the ancient

366

divergence time of the major rosid lineages (Bell et al., 2010; Wang et al., 2009),

367

make it especially difficult to differentiate between the two. Although the effects of

368

incomplete lineage sorting typically are studied on recent radiations, they can also 17 / 40

369

obscure the resolution of ancient radiations (Oliver, 2013; Whitfield and Lockhart,

370

2007), such as the deep relationships among mammals (McCormack et al., 2012;

371

Song et al., 2012). If we consider the placement of COM as a rooted 3-taxon (COM,

372

Fabidae, and Malvidae) phylogenetic problem, a process of incomplete lineage

373

sorting should yield approximately equal numbers of nuclear genes supporting the

374

two possible non-species tree topologies (Huson et al., 2005). Instead, we see that the

375

majority of genes supports COM with Malvidae, with far less support for COM with

376

Fabidae, and almost no support for COM outside of Fabidae + Malvidae (Tables 2, 3).

377

The differences in levels of support from nuclear loci suggest that incomplete lineage

378

sorting does not explain the phylogenetic discordance among genes. However, since

379

both Malvidae and Fabidae are not clades in some analyses (Figures 1–3), it is

380

possible that this 3-taxon case does not apply, and the expected patterns of support

381

from lineage sorting with more than three species are more complex (e.g., Rosenberg

382

and Nordberg, 2002; Degnan and Rosenberg, 2006).

383

Many plant lineages have experienced hybridization and introgression

384

throughout their evolutionary histories (e.g., Okuyama et al., 2005), and there are

385

more than a hundred records of interspecific hybridization among rosid taxa alone

386

(Rieseberg and Soltis, 1991; Rieseberg et al., 1996a). An ancient introgressive

387

hybridization event would likely produce conflict among independent loci (Wendel

388

and Doyle, 1998). The different placement of COM in trees constructed from

389

mitochondrial and chloroplast gene sequence data suggests that the evolutionary

390

histories of these two subcellular compartments are unlinked, with the chloroplast 18 / 40

391

genome derived from the Fabidae lineage and the mitochondrial genome from the

392

Malvidae lineage. This result is unexpected given that the chloroplast and

393

mitochondrial genomes typically are both maternally inherited in angiosperms (Birky,

394

1995, 2001; Corriveau and Coleman, 1988; Mogensen, 1996). However, there are

395

documented cases of biparental inheritance of organellar genomes (e.g., Fauré et al.,

396

1994; Havey et al., 1998; Testolin and Cipriani, 1997; Yang et al., 2000), and paternal

397

inheritance of chloroplast genomes has been documented in COM species (Turnera

398

ulmifolia; Shore et al., 1994; Shore and Triassi, 1998) and Fabidae (Medicago sativa;

399

Masoud et al., 1990; Schumann and Hancock, 1989; Larrea; Yang et al., 2000).

400

Additionally, empirical studies suggest that progeny from a hybridization event may

401

exhibit strong paternal chloroplast inheritance, while mitochondrial inheritance

402

remained exclusively maternal (Schumann and Hancock, 1989; Masoud et al., 1990;

403

Shore et al., 1994; Xu, 2005). Thus, it is conceivable that an ancient hybridization

404

event resulted in different evolutionary histories for the chloroplast and mitochondrial

405

genomes.

406

In this putative ancient hybridization scenario, an early member of Fabidae or its

407

immediate ancestor acted as the paternal parent and crossed with the maternal lineage

408

of a member of Malvidae, with accompanying paternal transmission of the chloroplast

409

to the ancestor of COM (F1). This event could have created conflicting histories in the

410

chloroplast and mitochondrial genomes and conflict among nuclear loci with half of

411

the alleles in the F1 contributed by each parent (Figures 1, 2, 4). Repeated selfing or

412

crossing of the hybrid derivatives would not explain the high percentage of nuclear 19 / 40

413

loci supporting the relationship of COM with Malvidae (Tables 2, 3), suggesting the

414

possibility of subsequent backcrosses of the early hybrids to the maternal Malvidae,

415

and reducing the number of nuclear loci supporting the placement of COM with

416

Fabidae (Figure 4). Considering the difficulty of producing fertile hybrids from

417

crosses of distantly related lineages, this proposed ancient hybridization event should

418

be viewed with caution.

419

5. Concluding remarks

420

Numerous plant systematics studies have demonstrated the promise of genomic

421

data to resolve angiosperm relationships that were not evident in analyses with a few

422

genes (Burleigh et al., 2011; Finet et al., 2010; Lee et al., 2011; Moore et al., 2010,

423

2011; Zeng et al., 2014). We demonstrate here that analyses of data sets with many

424

unlinked loci can highlight the ambiguity and discordance in phylogenetic

425

relationships and potentially reveal the complexity of angiosperm evolution. Most, but

426

not all, single- and multi-copy nuclear loci, as well as mitochondrial genes, support

427

the placement of COM with Malvidae. This placement is also consistent with patterns

428

of morphological evolution (Endress and Matthews, 2006), but it contradicts the

429

strongly supported analyses of chloroplast sequence data sets (Figures 1–4; Jansen et

430

al., 2007; Moore et al., 2010, 2011; Ruhfel et al., 2014). While analyses involving a

431

single data source, such as the chloroplast genome, seek a single phylogeny, it may be

432

more informative to appreciate the potentially chimeric origins of COM rather than to

433

force its placement in a binary species tree. Although with current sampling we cannot 20 / 40

434

conclusively infer the processes that caused the conflicting placement of COM, our

435

analyses emphasize the importance of phylogenomic data for highlighting

436

phylogenetic incongruence and directing future studies.

437

Acknowledgements

438

We thank Yin-Long Qiu, who contributed to the early design of this project, and Ning

439

Zhang, who graciously provided us with the 92-taxon, 5-gene nuDNA alignment used

440

in this study. This work was supported by the National Natural Science Foundation of

441

China (NNSF 31270268), National Basic Research Program of China (No.

442

2014CB954101), Chinese Academy of Sciences Visiting Professorship for Senior

443

International Scientists (grant number 2011T1S24), State Key Laboratory of

444

Systematic and Evolutionary Botany (grant number LSEB2011-10), and the US

445

National Science Foundation (DEB-1301828).

446

The authors declare no conflict of interest.

447

21 / 40

448

References

449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488

Acosta, M.C., Premoli, A.C., 2010. Evidence of chloroplast capture in South American Nothofagus (subgenus Nothofagus, Nothofagaceae). Mol. Phylogenet. Evol. 54, 235-242. Albach, D.C., Soltis, D.E., Soltis, P.S., Olmstead, R.G., 2001. Phylogenetic analysis of asterids based on sequences of four genes. Ann. Mo. Bot. Gard. 88, 163-212. APG III, 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot. J. Linn. Soc. 161, 105-121. Bell, C.D., Soltis, D.E., Soltis, P.S., 2010. The age and diversification of the angiosperms re-revisited. Am. J. Bot. 97, 1296-1303. Birky, C.W., 1995. Uniparental inheritance of mitochondrial and chloroplast genes: mechanisms and Evolution. Proc. Natl. Acad. Sci. USA. 92, 11331-11338. Birky, C.W., 2001. The inheritance of genes in mitochondria and chloroplasts: laws, mechanisms, models. Annu. Rev. Genet. 35, 125-148. Bremer, K., Backlund, A., Sennblad, B., Swenson, U., Andreasen, K., Hjertson, M., Lundberg, J., Backlund, M., Bremer, B., 2001. A phylogenetic analysis of 100+ genera and 50+ families of euasterids based on morphological and molecular data with notes on possible higher level morphological synapomorphies. Pl. Syst. Evol. 229, 137-169. Bremer, K., Friis, E., Bremer, B., 2004. Molecular phylogenetic dating of asterid flowering plants shows early Cretaceous diversification. Syst. Biol. 53, 496-505. Buckley, T.R., Cordeiro, M., Marshall, D.C., Simon, C., 2006. Differentiating between hypotheses of lineage sorting and introgression in New Zealand alpine cicadas (Maoricicada Dugdale). Syst. Biol. 55, 411-425. Burleigh, J.G., Hilu, K.W., Soltis, D.E., 2009. Inferring phylogenies with incomplete data sets: a 5-gene, 567-taxon analysis of angiosperms. BMC Evol. Biol. 17, 61. Burleigh, J.G., Bansal, M.S., Eulenstein, O., Hartmann, S., Wehe, A., Vision, T.J., 2011. Genome-Scale Phylogenetics: Inferring the Plant Tree of Life from 18,896 Gene Trees. Syst. Biol. 60, 117-25. Cantino, P.D., Doyle, J.A., Graham, S.W., Judd, W.S., Olmstead, R.G., Soltis, D.E., Soltis, P.S., Donoghue, M.J., 2007. Towards a phylogenetic nomenclature of Tracheophyta. Taxon 56, 822-846. Chang, S.W., Oshida, T., Endo, H., Nguyen, S.T., Dang, C.N., Nguyen, D.X., Jiang, X., Li, Z.J., Lin, L.K., 2011. Ancient hybridization and underestimated species diversity in Asian striped squirrels (genus Tamiops): inference from paternal, maternal and biparental markers. J. Zool. 285, 128-138. Chase, M.W., Soltis, D.E., Olmstead, R.G., Morgan, D., Les, D.H., Mishler, B.D., Duvall, M.R., Price, R.A., Hills, H.G., Qiu, Y.L., Kron, K.A., Rettig, J.H., 22 / 40

489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532

Conti, E., Palmer, J.D., Manhart, J.R., Sytsma, K.J,. Michaels, H.J., Kress, W.J., Karol, K.G., Clark, W.D., Hedren, M., Gaut, B.S., Jansen, R.K., Kim, K.J., Wimpee, C.F., Smith, J.F., Furnier, G.R., Strauss, S.H., Xiang, Q.Y., Plunkett, G.M., Soltis, P.S., Swensen, S.M., Williams, S.E., Gadek, P.A., Quinn, C.J., Eguiarte, L.E., Golenberg, E.Jr., Learn, G.H., Graham, S.W., Barrett, S.C.H., Dayanandan, S., Albert, V.A., 1993. Phylogenetics of seed plants: an analysis of nucleotide-sequences from the plastid gene rbcL. Ann. Mo. Bot. Gard. 80, 528-80. Chase, M.W., Soltis, D.E., Soltis, P.S., Rudall, P.J., Fay, M.F., Hahn, W.J., Sullivan, S., Joseph, J., Molvray, M., Kores, P.J., Givnish, T.J., Sytsma, K.J., Pires, J.C., 2000. Higher-level systematics of the monocotyledons: An assessment of current knowledge and a new classification, in: Wilson, K.L. Morrison, D.A. (Eds.), Monocots: Systematics and Evolution. CSIRO Publishing, Collingwood, pp. 3-16. Chen, F., Mackey, A.J., Stoeckert, C.J.Jr., Roos, D.S., 2006. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 34, D363-368. Comes, H.P., Abbott, R.J., 2001. Molecular phylogeography, reticulation, and lineage sorting in Mediterranean Senecio sect. Senecio (Asteraceae). Evolution 55, 1943-1962. Corriveau, J.L., Coleman., A.W., 1988. Rapid screening method to detect potential biparental inheritance of plastid DNA and results over 200 angiosperm species. Am. J. Bot. 75, 1443-1458. Cui, R., Schumer, M., Kruesi, K., Walter, R., Andolfatto, P., Rosenthal, G.G., 2013. Phylogenomics reveals extensive reticulate evolution in Xiphophorus fishes. Evolution 67, 2166-2179. Degnan, J.H., Rosenberg, N.A., 2006. Discordance of species trees with their most likely gene trees. PLoS Genet. 2, e68. Degnan, J.H., Rosenberg, N.A., 2009. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24, 332-340. Delsuc, F., Brinkmann, F.H., Philippe, H., 2005. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 6, 361-375. Doyle, J.J., 1992. Gene trees and species trees - molecular systematics as one-character taxonomy. Syst. Bot. 17, 144-163. Drummond, A.J., Rambaut, A., 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214. Duarte, J.M., Wall, P.K., Edger, P.P., Landherr, L.L., Ma, H., Pires, J.C., Leebens-Mack, J., dePamphilis, C.W., 2010. Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels. BMC Evol. Biol. 10, 61. Dunn, C.W., Hejnol, A., Matus, D.Q., Pang, K., Browne, W.E., Smith, S.A., Seaver, E., Rouse, G.W., Obst, M., Edgecombe, G.D., et al., 2008. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745-749. 23 / 40

533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576

Durand, E.Y., Patterson, N., Reich, D., Slatkin, M., 2011. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239-2252. Endress, P.K., Matthews, M.L., 2006. Floral structure and systematics in four orders of rosids, including a broad survey of floral mucilage cells. Plant Syst. Evol. 260, 223-251. Endress, P.K., Davis, C.C., Matthews, M.L., 2013. Advances in the floral structural characterization of the major subclades of Malpighiales, one of the largest orders of flowering plants. Ann. Bot. 111, 969-985. Fauré, S., Noyer, J.L., Carreel, F., Horry, J.P., Bakry, F., Lanaud, C., 1994. Maternal inheritance of chloroplast genome and paternal inheritance of mitochondrial genome in bananas (Musa acuminate). Curr. Genet. 25, 265-269. Finet, C., Timme, R.E., Delwiche, C.F., Marlétaz, F., 2010. Multigene phylogeny of the green lineage reveals the origin and diversification of land plants. Curr. Biol. 21, 2217-2222. Gibson, A., Gowri-Shankar, V., Higgs, P.G., Rattray, M., 2005. A comprehensive analysis of mammalian mitochondrial genome base composition and improved phylogenetic methods. Mol. Biol. Evol. 22, 251-264. Givnish, T.J., Pires, J.C., Graham, S.W., McPherson, M.A., Prince, L.M., Patterson, T.B., Rai, H.S., Roalson, E.R., Evans, T.M., Hahn, W.J., Millam, K.C., Meerow, A.W., Molvray, M., Kores, P., O‘Brien, H.E., Kress, W.J., Hall, J., Sytsma, K.J., 2006. Phylogeny of the monocotyledons based on the highly informative plastid gene ndhF: Evidence for widespread concerted convergence, in Columbus, J.T., Friar, E.A., Porter, J.M., Prince, L.M., Simpson, M.G. (Eds.), Monocots: Comparative Biology and Evolution Excluding Poales. Rancho Santa Ana Botanic Garden, California, pp. 28-51. Givnish, T.J., Ames, M., McNeal, J.R., dePamphilis, C.W., Graham, S.W., Pires, J.C., Stevenson, D.W., Zomlefer, W.B., Briggs, B.G., Duvall, M.R., Moore, M.J., Heaney, J.M., Soltis, D.E., Soltis, P.S., Thiele, K., Leebens-Mack, J.H., 2010. Assembling the tree of the monocotyledons: Plastome sequence phylogeny and evolution of Poales. Ann. Mo. Bot. Gard. 97, 584-616. Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G., 1979. Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed by globin sequences. Syst. Zool. 28, 132-163. Górecki, P., Burleigh, J.G., Eulenstein, O., 2012. GTP supertrees from unrooted gene trees: linear time algorithms for NNI based local searches, in: Bioinformatics Research and Applications, Springer, Berlin, pp. 102-114. Goremykin, V.V., Nikiforova, S.V., Bininda-Emonds, O.R.P., 2010. Automated removal of noisy data in phylogenomic analyses. J. Mol. Evol. 71, 319-331. Graham, S.W., Zgurski, J.M., McPherson, M.A., Cherniawsky, D.M., Saarela, J.M., Horne, E.S.C., Smith, S.Y., Wong, W.A., O‘Brien, H.E., Pires, J.C., Olmstead, R.G., Chase, M.W., Rai, H.S., 2006. Robust inference of monocot deep phylogeny using an expanded multigene plastid data set. Aliso. 22, 3-20. 24 / 40

577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620

Harrison, G.A., McLenachan, P.A., Phillips, M.J., Slack, K.E., Cooper, A., Penny, D., 2004. Four new avian mitochondrial genomes help get to basic evolutionary questions in the late Cretaceous. Mol. Biol. Evol. 21, 974-983. Hashimoto, T., Nakamura, Y., Kamaishi, T., Nakamura, F., Adachi, J., Okamoto, K., Hasegawa, M., 1995. Phylogenetic place of mitochondrion-lacking protozoan, Giardia lamblia, inferred from amino acid sequences of elongation factor 2. Mol. Biol. Evol. 12, 782-793. Havey, M.J., McCreight, J.D., Rhodes, B., Taurick, G., 1998. Differential transmission of the Cucumis organellar genomes. Theor. Appl. Genet. 97, 122-128. Hilu, K.W., Borsch, T., Müller, K., Soltis, D.E., Soltis, P.S., Savolainen, V., Chase, M.W., Powell, M.P., Alice, L.A., Evans, R., Sauquet, H., Neinhuis, C., Slotta, T.A.B., Rohwer, J.G., Campbell, C.S., Chatrou, L.W., 2003. Angiosperm Phylogeny Based on matK Sequence Information. Am. J. Bot. 90, 1758-1776. Holder, M.T., Anderson, J.A., Holloway, A.K., 2001. Difficulties in detecting hybridization. Syst. Biol. 50, 978-982. Holland, B.R., Benthin, S., Lockhart, P.J., Moulton, V., Huber, K.T., 2008. Using supernetworks to distinguish hybridization from incomplete lineage sorting. BMC Evol. Biol. 8, 202. Hudson, R.R., 1983. Testing the constant-rate neutral allele model with protein sequence data. Evolution 37, 203-217. Huson, D.H, Klöpper, T., Lockhart, P.J., Steel, M.A., 2005. Reconstruction of reticulate networks from gene trees, in: Proceedings of the Ninth International Conference on Research in Computational Molecular Biology, Springer, Heidelberg, pp. 233-249. Jansen, R.K., Cai, Z., Raubeson, L.A., Daniell, H., Depamphilis, C.W., Leebens-Mack, M.J., Muller, K.F., Guisinger-Bellian, M., Haberle, R.C., Hansen, A.K., 2007. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc. Natl. Acad. Sci. USA. 104, 19369-19374. Jerrold, I., Davis, D.W., Stevenson, Ole, G.P.S., Lisa, M., Campbell, J.V., Freudenstein, D.H., Goldman, C.R., Hardy, F.A., Michelangeli, M.P., Simmons, C.D., Specht, F.V.S., Maria, G., 2004. A phylogeny of the monocots, as inferred from rbcL and atpA sequence variation, and a comparison of methods for calculating jackknife and bootstrap values. Syst. Bot. 29, 467-510. Joly, S., McLenachan, P.A., Lockhart, P.J., 2009. A statistical approach for distinguishing hybridization and incomplete lineage sorting. Am. Nat. 174, E54-E70. Jones, D.T., Taylor, W.R., Thornton, J.M., 1992. The rapid generation of mutation data matrices from protein sequences. CABIOS. 8, 275-282. Judd, W.S., Olmstead, R.G., 2004. A survey of tricolpate eudicot phylogenetic relationships. Am. J. Bot. 91, 1627-1644. Junier, T., Zdobnov, E.M., 2010. The Newick utilities: high-throughput phylogenetic tree processing in the UNIX shell. Bioinformatics 26, 1669-1670. Katoh, K., Kuma, K.I., Toh, H., Miyata, T., 2005. MAFFT version 5: improvement in 25 / 40

621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664

accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511-518. Lee, E.K., Cibrian-Jaramillo, A., Kolokotronis, S.O., Katari, M.S., Stamatakis, A., Ott, M., Chiu, J.C., Little, D.P., Stevenson, D.W., McCombie, W.R., Martienssen, R.A., Coruzzi, G., DeSalle, R., 2011. A functional phylogenomic view of the seed plants. PLoS Genetics 7, e1002411. Linder, C.R., Rieseberg, L.H., 2004. Reconstructing patterns of reticulate evolution in plants. Am. J. Bot. 9, 1700-1708. Liu, L., Yu, L., Pearl, D.K., Edwards, S.V., 2009. Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58, 468-477. Liu, L., Yu, L., Edwards, S.V., 2010. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10, 302. Maddison, W.P., 1997. Gene trees in species trees. Syst. Biol. 46, 523-536. Maddison, W.P., Knowles, L.L., 2006. Inferring phylogeny despite incomplete lineage sorting. Syst. Biol. 55, 21-30. Masoud, S.A., Johnson, L.B., Sorensen, E.L., 1990. High transmission of paternal DNA in alfalfa plants demonstrated by restriction fragment polymorphic analysis. Theor. Appl. Genet. 79, 49-55. McCormack, J.E., Faircloth, B.C., Crawford, N.G., Gowaty, P.A., Brumfield, R.T., Glenn, T.C., 2012. Ultraconserved elements are novel phylogenetic markers that resolve placental mammal phylogeny when combined with species-tree analysis. Genome Res. 22, 746-754. Mogensen, H.L., 1996. The hows and whys of cytoplasmic inheritance in seed plants. Am. J. Bot. 83, 383-404. Moore, M.J., Soltis, P.S., Bell, C.D., Burleigh, J.G., Soltis, D.E., 2010. Phylogenetic analysis of 83 plastid genes further resolved the early diversification of eudicots. Proc. Natl. Acad. Sci. USA. 107, 4623-4628. Moore, M.J., Hassan, N., Gitzendanner, M.A., Bruenn, R.A., Croley, M., Vandeventer, A., Horn, J.W., Dhingra, A., Brockington, S.F., Latvis, M., Ramdial, J., Alexandre, R., Piedrahita, A., Xi, Z., Davis, C.C., Soltis, P.S., Soltis, D.E., 2011. Phylogenetic analysis of the plastid inverted repeat for 244 species: insights into deeper-level angiosperm relationships from a long, slowly evolving sequence region. Int. J. Plant Sci. 172, 541-558. Morton, M.C., 2011. Newly Sequenced Nuclear Gene (Xdh) for Inferring Angiosperm Phylogeny. Ann. Mo. Bot. Gard. 98, 63-89. Okuyama, Y., Fujii, N., Wakabayashi, M., Kawakita, A., Ito, M., Watanabe, M., Murakami, N., Kato, M., 2005. Nonuniform concerted evolution and chloroplast capture: heterogeneity of observed introgression patterns in three molecular data partition phylogenies of Asian Mitella (Saxifragaceae). Mol. Biol. Evol. 22, 285-296. Oliver, J.C., 2013. Microevolutionary processes generate phylogenomic discordance at ancient divergences. Evolution 67, 1823-1830. Olmstead, R.G., Kim, K.J., Jansen, R.K., Wagstaff, S.J., 2000. The phylogeny of the Asteridae sensu lato based on chloroplast ndhF gene sequences. Mol. Phylogenet. Evol. 16, 96-112. 26 / 40

665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708

Page, R.D.M., Charleston, M.A., 1998. Trees within trees: phylogeny and historical associations. Trends Ecol. Evol. 13, 356-359. Philippe, H., Delsuc, F., Brinkmann, H., Lartillot, N., 2005. Phylogenomics. Annu. Rev. Ecol. Evol. Syst. 36, 541-562. Phillips, M.J., Penny, D., 2003. The root of the mammalian tree inferred from whole mitochondrial genomes. Mol. Phylogenet. Evol. 28, 171-185. Phillips, M.J., Delsuc, F., Penny, D., 2004. Genome-Scale Phylogeny and the Detection of Systematic Biases. Mol. Biol. Evol. 21, 1455-1458. Qiu, Y.L., Li, L.B., Wang, B., Xue, J.Y., Hendry, T.A., Li, R.Q., Brown, J.W., Liu, Y., Hudson, Y.H., Chen, Z.D., 2010. Angiosperm phylogeny inferred from sequences of four mitochondrial genes. J. Syst. Evol. 48, 391-425. Rajan, V., 2013., A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments. Mol. Biol. Evol. 30, 689-712. Rasmussen, M.D., Kellis, M., 2011. A Bayesian approach for fast and accurate gene tree reconstruction. Mol. Biol. Evol. 28, 273-290. Rasmussen, M.D., Kellis, M., 2012. Unified modeling of gene duplication, loss and coalescence using a locus tree. Genome Res. 22, 755-65. Regier, J.C., Zwick, A., 2011. Sources of signal in 62 protein-coding nuclear genes for higher-level phylogenetics of arthropods. PLoS ONE 6, e23408. Rieseberg, L.H., Soltis, D.E., 1991. Phylogenetic consequences of cytoplasmic gene flow in plants. Evol. Trends Plants 5, 65-84. Rieseberg, L.H., Desrochers, A.M., Youn, S.J., 1995. Interspecific pollen competition as a reproductive barrier between sympatric species of Helianthus (Asteraceae). Am. J. Bot. 82, 515-519. Rieseberg, L.H., Whitton, J., Linder, C.R., 1996a. Molecular marker incongruence in plant hybrid zones and phylogenetic trees. Acta Bot. Neerl. 45, 243-262. Rieseberg, L.H., Sinervo, B., Linder, C.R., Ungerer, M.C., Arias, D.M., 1996b. Role of gene interactions in hybrid speciation: evidence from ancient and experimental hybrids. Science 272, 741-744. Rosenberg, N.A., Nordberg, M., 2002. Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nature Rev. 3, 380-390. Ruhfel, B.R., Gitzendanner, M.A., Soltis, D.E., Soltis, P.S., Burleigh, J.G., 2014. From algae to angiosperms – inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes. BMC Evol. Biol. 14, 23. Saarela, J.M., Graham, S.W., 2010. Inference of phylogenetic relationships among the subfamilies of grasses Poaceae: Poales. using meso-scale sampling of the plastid genome. Botany 88, 65-84. Sang, T., Zhong, Y., 2000. Testing hybridization hypotheses based on incongruent gene trees. Syst. Biol. 49, 422-434. Savolainen, V., Fay, M.F., Albach, D.C., Backlund, A., van der Bank, M., Cameron, K.M., Johnson, S.A., Lledó, M.D., Pintaud, J.C., Powell, M., Sheahan, M.C., Soltis, D.E., Soltis, P.S., Weston, P., Whitten, W.M., Wurdack, K.J., Chase, M.W., 2000. Phylogeny of the eudicots: A nearly complete familial analysis based on rbcL gene sequences. Kew Bull. 55, 257-309. 27 / 40

709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752

Schumann, C.M., Hancock, J.F., 1989. Paternal inheritance of plastids in Medicago sativa. Theor. Appl. Genet. 78, 863-866. Shore, J.S., Mcqueen, K.L., Little, S.H., 1994. Inheritance of plastid DNA in the Turnera ulmifolia complex (Turneraceae). Am. J. Bot. 81, 1636-1639. Shore, J.S., Triassi, M., 1998. Paternally biased cpDNA inheritance in Turnera ulmifolia (Turneraceae). Am. J. Bot. 85, 328-332. Shulaev, V., Sargent, D.J., Crowhurst, R.N., Mockler, T.C., Folkerts, O., Delcher, A.L., Jaiswal, P., Mockaitis, K., Liston, A., Mane, S.P., Burns, P., Davis, T.M., Slovin, J.P., Bassil, N., Hellens, R.P., Evans, C., Harkins, T., Kodira, C., Desany, B., Crasta, O.R., Jensen, R.V., Allan, A.C., Michael, T.P., Setuba, J.C., Celton, J., Rees, D.J.G., Williams, K.P., Holt, S.H., Rojas, J.J.R., Chatterjee, M., Liu, B., Silva, H., Meisel, L., Adato, A., Filichkin, S.A., Troggio, M., Viola, R., Ashman, T., Wang, H., Dharmawardhana, P., Elser, J., Raja, R., Priest, H.D., Bryant, Jr.D.W., Fox, S.E., Givan, S.A., Wilhelm, L.J., Naithani, S., Christoffels, A., Salama, D.Y., Carter, J., Girona, E.L., Zdepski, A., Wang, W., Kerstetter, R.A., Schwab, W., Korban, S.S., Davik, J., Monfort, A., Denoyes-Rothan, B., Arus, P., Mittler, R., Flinn, B., Aharoni, A., Bennetzen, J.L., Salzberg, S.L., Dickerman, A.W., Velasco, R., Borodovsky, M., Veilleux, R.E., Folta, K.M., 2010. The genome of woodland strawberry (Fragaria vesca). Nat. Genet. 43, 109-116. Simon, S., Narechania, A., DeSalle, R., Hadrys, H., 2012. Insect phylogenomics: Exploring the source of incongruence using new transcriptomic data. Genome Biol. Evol. 4, 1295-1309. Smith, S.A., Wilson, N.G., Goetz, F.E., Feehery, C., Andrade, S.C.S., Rouse, Giribet, G.W.G., Dunn, C.W., 2011. Resolving the evolutionary relationships of molluscs with phylogenomic tools. Nature 480, 364-367. Soltis, D.E., Soltis, P.S., Chase, M.W., 1999. Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402, 402-404. Soltis, D.E., Kuzoff, R.K., 1995. Discordance between nuclear and chloroplast phylogenies in the Heuchera group (Saxifragaceae). Evolution 49, 727-742. Soltis, D.E., Soltis, P.S., Chase, M.W., Mort, M.E., Albach, D.C., Zanis, M., Savolaninen, Hahn, V.W.H., Hoot, S.B., Fay, M.F., Axtell, M., Swensen, S.M., Prince, L.M., Kress, W.J., Nixon, K.C., Farris, J.S., 2000. Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences. Bot. J. Linn. Soc. 133, 381-461. Soltis, D.E., Soltis, P.S., Endress, P.K., Chase, M.W., 2005. Phylogeny and evolution of angiosperms. Sinauer Associates, Sunderland. Soltis, D.E., Gitzendanner, M.A., Soltis, P.S., 2007. A 567-taxon data set for angiosperms: The challenges posed by Bayesian analyses of large data sets. Int. J. Plant Sci. 168, 137-157. Soltis, D.E., Moore, M.J., Burleigh, J.G., Bell, C.D., Soltis, P.S., 2009. Molecular markers and concepts of plant evolutionary relationships: progress, promise, and future prospects. CRC Crit. Rev. Plant Sci. 28, 1-15. Soltis, D.E., Soltis, P.S., 2009. The role of hybridization in plant speciation. Annu. 28 / 40

753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796

Rev. Plant. Biol. 60, 561-588. Soltis, D.E., Smith, S.A., Cellinese, N., Wurdack, K.J., Tank, D.C., Brockington, S.F., Refulio-Rodriguez, N.F., Walker, J.B., Moore, M.J., Carlsward, B.S., et al., 2011. Angiosperm phylogeny: 17 genes, 640 taxa. Am. J. Bot. 98, 704-730. Song, S., Liu, L., Edwards, S.V., Wu, S., 2012. Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc. Natl. Acad. Sci. USA. 109, 14942-14947. Stamatakis, A., Hoover, P., Rougemont, J., 2008. A Rapid bootstrap algorithm for the RAxML web servers. Syst. Biol. 57, 758-771. Testolin, R., Cipriani, G., 1997. Paternal inheritance of chloroplast DNA and maternal inheritance of mitochondrial DNA in the genus Actinidia. Theor. Appl. Genet. 94, 897-903. Tsitrone, A., Kirkpatrick, M., Levin, D.A., 2003. A model for chloroplast capture. Evolution 57, 1776-82. Wang, H.C., Moore, M.J., Soltis, P.S., Bell, C.D., Brockington, S.F., Alexandre, R., Davis, C.C., Latvis, M., Manchester, S.R., Soltis, D.E., 2009. Rosids radiation and the rapid rise of angiosperm-dominated forests. Proc. Natl. Acad. Sci. USA. 106, 3853-3858. Wendel, J.F., Schnabel, A., Seelanan, T., 1995. An unusual ribosomal DNA sequence from Gossypium gossypioides reveals ancient, cryptic, intergenomic introgression. Mol. Phylogenet. Evol. 4, 298-313. Wendel, J.F., Doyle, J.J., 1998. Phylogenetic incongruence: window into genome history and molecular evolution, in: Soltis, D.E., Soltis, P.S., Doyle, J.J. (Eds.), Molecular systematics of plants II: DNA sequencing. Kluwer Academic Publishers, Boston, pp. 256-296. Whitfield, J.B., Lockhart, P.J., 2007. Deciphering ancient rapid radiations. Trends Ecol. Evol. 22, 258-265. Xi, Z., Liu, L., Rest, J.S., Davis, C.C., 2014. Coalescent versus concatenation methods and the placement of Amborella sister to water lilies. Syst. Biol. 63, 919-934. Xiang, Q.Y.J., Manchester, S.R., Thomas, D.T., Zhang, W., Fan, C., 2005. Phylogeny, biogeography, and molecular dating of cornelian cherries (Cornus, Cornaceae): tracking Tertiary plant migration. Evolution 59, 1685-1700. Xu, J., 2005. The inheritance of organelle genes and genomes: patterns and mechanisms. Genome 48, 951-958. Yang, T.W., Yang, Y.A., Xiong, Z., 2000. Paternal inheritance of chloroplast DNA in interspecific hybrids in the genus Larrea (Zygophyllaceae). Am. J. Bot. 87, 1452-1458. Yoder, J.B., Briskine, R., Mudge, J., Farmer, A., Paape, T., Steele, K., Weiblen, G.D., Bharti, A.K., Zhou, P., May, G.D., Young, N.D., Tiffin, P., 2013. Phylogenetic signal variation in the genomes of Medicago (Fabaceae). Syst. Biol. 62, 424-438. Zeng, L., Zhang, Q., Sun, R., Kong, H., Zhang, N., Ma, H., 2014. Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early 29 / 40

797 798 799 800 801 802 803 804 805 806 807 808 809

divergence times. Nat. Commun. 5, doi:10.1038/ncomms5956. Zhang, N., Zeng, L.P., Shan, H.Y., Ma, H., 2012. Highly conserved low-copy nuclear genes as effective markers for phylogenetic analyses in angiosperms. New Phytol. 195, 923-937. Zhang, J.Q., Meng, S.Y., Allen, G.A., Wen, J., Rao, G.Y., 2014. Rapid radiation and dispersal out of the Qinghai-Tibetan Plateau of an alpine plant lineage Rhodiola (Crassulaceae). Mol. Phylogenet. Evol. 77, 147-158. Zhu, X.Y., Chase, M.W., Qiu, Y.L., Kong, H.Z., Dilcher, D.L., Li, J.H., Chen, Z.D., 2007. Mitochondrial matR sequences help to resolve deep phylogenetic relationships in rosids. BMC Evol. Biol. 7, 217. Zou, X.H., Zhang, F.M., Zhang, J.G., Zang, L.L., Tang, L., Wang, J., Sang, T., Ge, S 2008. Analysis of 142 genes resolves the rapid diversification of the rice genus. Genome Biol. 9, R49.

810

30 / 40

811

Figure Legends

812

Figure 1: Majority-rule consensus tree of Maximum Likelihood (ML) bootstrap

813

analysis of 78-gene chloroplast matrix. These data place COM with Fabidae. To

814

highlight the position of COM, only the Fabidae, COM, and Malvidae clades were

815

labeled, and COM is isolated from the circumscription of Fabidae as previously

816

defined (Cantino et al., 2007). All the familial and ordinal names of the sampled taxa

817

follow APG III (2009). Numbers above branches are bootstrap percentages (BS).

818

Figure 2: Majority-rule consensus tree of Maximum Likelihood (ML) bootstrap

819

analysis of 4-gene mitochondrial matrix. These data place COM with Malvidae.

820

Figure 3: Majority-rule consensus tree of Maximum Likelihood (ML) bootstrap

821

analysis of 5-gene nuclear matrix. These data place COM with Malvidae.

822

Figure 4: Hypothetical reticulation scenario for the origin of COM from the ancestral

823

Fabidae and Malvidae lineages. Large circles reflect the plant lineages; the small

824

circles represent their nuclear DNA types (the red circle represents the Fabidae

825

nuclear DNA and the blue one the Malvidae nuclear DNA), the ovals represent the

826

chloroplast (the green ovals represent the Fabidae chloroplast, and the gray oval

827

represents the chloroplast from the Malvidae ancestor), and the diamonds represent

828

the mitochondria (the gray diamond represents the Fabidae mitochondrion and the

829

orange diamond the Malvidae mitochondrion); dashed arrows represent multiple

830

generations of backcrossing. During hybridization, the mitochondrion is maternally

831

inherited from the Malvidae ancestor, and the chloroplast is paternally inherited from 31 / 40

832

the Fabidae ancestor. After subsequent F1 backcrosses to the Malvidae, the resulting

833

generations contain chloroplasts from Fabidae, mitochondria from Malvidae, and a

834

majority of the nuclear genes from Malvidae, with a smaller number from Fabidae

835

(roughly 25%). The reticulate phylogeny at the bottom illustrates this hypothetical

836

introgressive hybridization scenario and shows the phylogenetic incongruence among

837

the three genomes with respect to COM.

838

32 / 40

839

Table 1. Summary of the placement of COM in previous phylogenetic studies.

Genome type

a

Relationship

Method of analysis/Support

Nr

Character-state weighting/–

COM + Fabidae COM + Fabidae COM + Fabidae

b

Parsimony/52% JK; BI/1.0 PP Parsimony jackknifing/77% JK; BI/1.0 PP ML/100% BS; MP/79% BS; BI/1.0 PP

Chloroplast

499



Chase et al., 1993

matK

374

16

Hilu et al., 2003

rbcL, atpB, 18S rDNA

560

64

81 cp

64

3

Jansen et al., 2007

567

59

Burleigh et al., 2009

rbcL, atpB, matK,

Soltis et al., 2000; Soltis et al.,2007

COM + Fabidae

ML/100% BS

10 cp, 2 nu

117

33

Wang et al., 2009

COM + Fabidae

ML/53% BS

83 cp

86

5

Moore et al., 2010

IR

244

14

Moore et al., 2011

11 cp, 2 nu, 4 mt

640

154

Soltis et al., 2011

78 cp

360

9

Ruhfel et al., 2014

COM + Fabida

33 / 40

rbcL

References

ML/89% BS

COM + Fabidae

Nuclear

COM sampling

c

COM + Fabidae

COM + Fabidae

Mitochondrial

Taxa Number

Marker

ML/99% BS (244 taxa); ML/89% BS (87 taxa) ML/57% BS ML/81% BS, 70% BS, 82% BS, 69% BS (ntAll, ntNo3rd, RY, AA)

18S rDNA, 26S rDNA

Malpighiales-F

ML/100% BS; BI/1.0 PP

45 cp

41

3

Xi et al., 2014

COM + Malvidae

ML/54% BS; MP/–

matR

174

21

Zhu et al., 2007

COM + Malvidae

ML/99% BS

atp1, matR, nad5, rps3

380

26

Qiu et al., 2010

Nr



18S rDNA

233

/

Oxalidales-M

ML/55% BS

Xdh

247

19

Morton 2011

COM + Malvidae

ML/>95% BS; BI/1.0 PP

SMC1, SMC2, MCM5,

94

5

Zhang et al., 2012

Soltis et al., 1997

MLH1, MSH1 Malpighiales-M COM + Malvidae Malpighiales-M

840

Note:

841

a

GTP-ML/18% BS (136 taxa); GTP-ML/75% BS (54 taxa) ML/>95% BS; MP/≤65% BS STAR/100% BP; MP-EST/100% BP; ML/100% BP; PhyloBayes/1.0 PP

18,896 gene trees

136

15

Burleigh et al., 2010

nuclear genome

101

7

Lee at al. 2011

310 nu

46

3

Xi et al., 2014

Nr = not resolved; COM + Fabidae = COM clade was placed in Fabidae; Malpighiales-F = only one member of COM Malpighiales, was

842

sampled and placed in Fabidae; COM + Malvidae = COM clade sister to Malvidae; Oxalidales-M = only Oxalidales sister to Malvidae;

843

Malpighiales-M = only Malpighiales sister to Malvidae.

844

b

JK = Jackknife value; BI = Bayesian inference; BS = Bootstrap value; BP = Bootstrap percentage; PP = Posterior probabilities; GTP = Gene

845

tree parsimony; STAR = a coalescent method: Species Tree Estimation using Average Ranks of Coalescence (Liu et al., 2009); MP-EST = a

846

coalescent method: Maximum Pseudo-likelihood for Estimating Species Trees (Liu et al., 2010); PhyloBayes = a Bayesian Monte Carlo

847

Markov Chain (MCMC) sampler for phylogenetic reconstruction using infinite mixtures

848

(http://megasun.bch.umontreal.ca/People/lartillot/www/downloadmpi.html).

849

c

81 cp = 81 chloroplast genes (Jansen et al., 2007); 10 cp, 2 nu = 10 chloroplast genes, rbcL, atpB, matK, psbBTNH region (4 genes), rpoC2,

34 / 40

850

ndhF, and rps4, and two nuclear genes, 18S rDNA and 26S rDNA; 83 cp = 83 chloroplast genes (Moore et al., 2010); IR means 25,000-bp

851

plastid Inverted Repeat region; 11 cp, 2 nu, 4 mt = 11 chloroplast genes, rbcL, atpB, matK, psbBTNH region (4 genes), rpoC2, ndhF, rps4

852

and rps16, and two nuclear genes, 18S rDNA and 26S rDNA, and four mitochondrial genes, atp1, matR, nad5, and rps3; ntAll, ntNo3rd, RY,

853

AA = four different character-coding matrix; ntAll = all nucleotide positions analysis; ntNo3rd = the first and second codon positions

854

analysis; RY = RY-coded analysis; AA = the amino acid analysis; 78 cp = chloroplast genes (Ruhfel et al., 2014); 45 cp = chloroplast genes

855

selected from plastid genome (See Supplementary Table S2 in Xi et al., 2014); 310 nu = low-copy nuclear genes selected from nuclear

856

genome and transcripts (See Supplementary Table S1 in Xi et al., 2014).

35 / 40

857

Table 2. Results from the single-copy nuclear gene analysis based on the ortholog

858

alignments from Lee et al. (2011). % BS

Fabidae

Malvidae

Fabidae + Malvidae

50

115 (36%) 208 (64%)

0

60

68 (34%)

131 (66%)

0

70

49 (36%)

89 (64%)

0

80

29 (38%)

48 (62%)

0

90

15 (35%)

28 (65%)

0

100

3 (25%)

9 (75%)

0

859

Note:

860

The columns list the number of orthologs (out of 8,445) and the percentage (in

861

parentheses) of all the informative genes that support a ‗COM + Fabidae‘ topology

862

(―Fabidae‖ column), a ‗COM + Malvidae‘ topology (―Malvidae‖ column), or COM

863

outside Fabidae + Malvidae (―Fabidae + Malvidae‖ column) with at least 50%

864

bootstrap (BS) support.

865

36 / 40

866

Table 3. Results from the multi-copy gene tree analysis based on genome

867

sequences of 22 land plant taxa.

868

Gene Duplications % BS Fabidae

869

38 (3%) 973 (91%)

62 (6%)

60

17 (2%) 723 (94%)

29 (4%)

70

12 (2%) 515 (96%)

11 (2%)

80

4 (1%)

308 (98%)

3 (1%)

90

2 (1%)

155 (98%)

1 (1%)

100

1 (6%)

15 (94%)

0 (0%)

Gene Duplications and Losses Fabidae

Malvidae

Fabidae + Malvidae

50

446 (20%) 1718 (76%)

82 (4%)

60

267 (16%) 1390 (81%)

47 (3%)

70

149 (12%) 1049 (86%)

26 (2%)

80

72 (9%)

715 (90%)

11 (1%)

90

30 (8%)

371 (91%)

5 (1%)

100

7 (11%)

54 (89%)

0 (0%)

Malvidae

Fabidae + Malvidae

Deep Coalescence % BS

871

Fabidae + Malvidae

50

% BS

870

Malvidae

Fabidae

50

547 (27%) 1468 (71%)

40 (2%)

60

358 (23%) 1160 (75%)

25 (2%)

70

229 (20%)

884 (79%)

13 (1%)

80

114 (16%)

598 (83%)

8 (1%)

90

58 (16%)

299 (83%)

4 (1%)

100

7 (13%)

44 (85%)

1 (2%)

Note: 37 / 40

872

We calculated the reconciliation cost based on the minimum number of implied gene

873

duplications, duplications + losses, and deep coalescence events, for 100 ML

874

bootstrap gene trees made from 3,784 multi-copy gene alignments. In the table, we

875

list the number and percentage (in parentheses) of genes (out of 3,784) that have at

876

least 50% bootstrap (BS) support for the three possible topologies based on the

877

reconciliation cost, respectively.

878

38 / 40

879

Supplementary materials

880

Table S1. Individual gene analyses for COM clade from 78-gene chloroplast, 4-gene

881

mitochondrial and 5-gene nuclear data sets.

882

Figure S1. RY coding tree from Maximum Likelihood (ML) bootstrap analysis of

883

78-gene chloroplast data set.

884

Figure S2. AA tree from Maximum Likelihood (ML) bootstrap analysis of 78-gene

885

chloroplast data set.

886

Figure S3. Best tree from Maximum Likelihood (ML) bootstrap analysis based on the

887

first and second codon positions only of 78-gene chloroplast data set.

888

Figure S4. AA tree from Maximum Likelihood (ML) bootstrap analysis of 4-gene

889

mitochondrial data set.

890

Figure S5. Best tree from Maximum Likelihood (ML) bootstrap analysis based on the

891

first and second codon positions only of 4-gene mitochondrial data set.

892

Figure S6. RY coding tree from Maximum Likelihood (ML) bootstrap analysis of

893

5-gene nuclear data set.

894

Figure S7. AA tree from Maximum Likelihood (ML) bootstrap analysis of 5-gene

895

nuclear data set.

39 / 40

896

Figure S8. Best tree from Maximum Likelihood (ML) bootstrap analysis based on the

897

first and second codon positions only of 5-gene nuclear data set.

40 / 40

100

64

100 100

100

52 81 100

100 100

100

100

100

100 100

99 100 100

95 73

100 100 100 100

100

100 62

100

100

100

59 100

100

100

100

100 100 100 100

100

100

100 100

100

100 100 100

55

67

100

100

100 100 100

98 72

100

100

95 59

100

100

100 100

100 100

100

100

98 92

53 100 100 100 100

100

Malvidae

100

88

COM

Rosidae

Bulnesia arborea Quercus nigra Glycine max Cucumis melo Ficus sp Morus indica Pentactina rupicola Euonymus americanus Oxalis latifolia Populus alba Passiflora biflora Jatropha curcas Ricinus communis Manihot esculenta Arabidopsis thaliana Carica papaya Gossypium arboreum Citrus sinensis Staphylea colchica Eucalyptus globulus Oenothera parviflora Pelargonium x Vitis vinifera Liquidambar styraciflua Heuchera sanguinea Sesamum indicum Epifagus virginiana Olea woodiana Solanum lycopersicum Ipomoea purpurea Coffea arabica Nerium oleander Aucuba japonica Lonicera japonica Daucus carota Eleutherococcus senticosus Panax ginseng Lactuca sativa Ilex cornuta Arbutus unedo Rhododendron simsii Franklinia alatamaha Cornus florida Davidia involucrata Silene vulgaris Anredera baselloides Fagopyrum esculentum Gunnera manicata Buxus microphylla Platanus occidentalis Meliosma aff cuneifolia Ranunculus macranthus Megaleranthis saniculifolia Nandina domestica Ceratophyllum demersum Saccharum hybrid Sorghum bicolor Oryza nivara Brachypodium distachyon Renealmia alpinia Musa acuminata Phoenix dactylifera Yucca schidigera Asparagus officinalis Iris virginica Lilium superbum Dioscorea elephantipes Colocasia esculenta Lemna minor Acorus americanus Piper cenocladum Calycanthus floridus Magnolia kwangsiensis Liriodendron tulipifera Chloranthus spicatus Illicium oligandrum Nuphar advena Nymphaea alba Amborella trichopoda Picea morrisonicola Pinus taeda Cycas taitungensis

Fabidae

64

27 63 97

71 33 73 83 94

100 99

53

94 97 100

86

55

61

36 28

100

13 86

96 100

100

47 100

100 55

27

95

38 60 83

22

77

100

100 68

98

34

90

100

92

71 100 100 100

59

98 67 45

100 61 60 27

99

31 74

53 88 100 54

88

100

44 22 100

25 100

96 100 100 100 100

Malvidae

Rosidae

Quercus Cucurbita Medicago Zelkova Morus Spiraea Euonymus Oxalis Euphorbia Hypericum Arabidopsis Carica Gossypium Citrus Schinus Cupaniopsis Stachyurus Guaiacum Oenothera Phytolacca Polygonum Vitis Corylopsis Cercidiphyllum Paeonia Lamium Scrophularia Syringa Nerium Galium Nicotiana Ipomoea Ilex Garrya Lonicera Helianthus Apium Hedera Pittosporum Hydrangea Cornus Actinidia Arbutus Diospyros Gunnera Pachysandra Buxus Meliosma Platanus Ranunculus Glaucidium Nandina Dicentra Ceratophyllum Chloranthus Oryza Maranta Strelitzia Chamaedorea Asparagus Iris Dioscorea Lilium Alisma Spathiphyllum Laurus Calycanthus Magnolia Liriodendron Houttuynia Thottea Aristolochia Schisandra Illicium Nuphar Brasenia Amborella Pinus Zamia

Fabidae COM

32

100

94 80

47

100

100 71 100

Rosidae

100

100

100

90 68 100

81

96

100

62 100 37 100

64

100 100

100

96 100

100

47 63

100

100

100

100

85

74

100 100 73

100 100

50

100 100

100

100

100 99

100

100 100 76

100

100

100

100 100 100 77

76 79 100

100

60 50 100

100

100

100 100

85

97

100 63

99 100

87 47

74

100 58

100

100 100 100

88

Malvidae

100

Cyclobalanopsis glauca Glycine max Morus alba Ulmus macrocarpa Photinia serrulata Cucumis sativus Euonymus carnosus Hypericum chinense Populus tricocarpa Ricinus communis Manihot esculenta Arabidopsis thaliana Arabidopsis lyrata Carica papaya Hibiscus syriacus Poncirus trifoliata Tetradium ruticarpum Rhus chinensis Sapindus mukorossi Stachyurus yunnanensis Oenothera erythrosepala Lagerstroemia limii Pelargonium hortorum Stellaria media Phytolacca americana Polygonum runcinatum Distylium buxifolium Cercidiphyllum japonicum Paeonia lactiflora Vitis vinifera Mimulus guttatus Callicarpa bodinieri Jassminum nudiflorum Solanum lycopersicum Pharbitis nil Nerium oleander Vinca major Galium aparine Ilex purpurea Aucuba japonica Lactuca sativa Lonicera japonica Hedera nepalensis Ligusticum chuanxiong Pittosporum tobira Rhododendron pulchrum Actinidia arguta Diospyros kaki Philadelphus incanus Cornus officinalis Cornus wisoniana Gunnera manicata Buxus sinica Pachysandra terminalis Platanus acerifolia Meliosma parviflora Ranunculus muricatus Aquilegia coerulea Nandina domestica Dicentra spectabilis Sorghum bicolor Setaria italica Brachypodium distachyon Oryza sativa Canna indica Musa basjoo Trachycarpus fortunei Dioscorea opposita Yucca filamentosa Asparagus officinalis Iris japonica Lilium brownii Alisma plantago-aquatica Pinellia ternata Acorus calamus Cinnamomum camphora Chimonanthus praecox Magnolia denudata Liriodendron Houttuynia cordata Asarum heterotropoides Aristolochia fimbriata Ceratophyllum taobao Chloranthus Schisandra Illicium henryi Brasenia schreber Nuphar advena Amborella trichopoda Pinus Picea Zamia fischeri

Fabidae COM

78 86

♀ Malvidae ancestor

Fabidae ancestor ♂

× × COM ancestor

Fabidae

COM

Malvidae

Nuclear gene tree Chloroplast gene tree Mitochondrial gene tree

Highlights • Report phylogenetic conflict of COM in chloroplast, mitochondrial, and nuclear data. • Results of multi-gene and genomic data show strong evidence for deep incongruence. • We provide an example for examination of other deep nodes of the tree of life. • Genomic datasets highlight patterns of deep incongruence in angiosperm phylogeny. • Stress the complexity of angiosperm evolution, which may be masked by a few genes.

Fabidae ancestor ♂

×

Fabidae chloroplast DNA Fabidae nuclear DNA Malvidae nuclear DNA Malvidae mitochodrial DNA

Fabidae

×

COM

Malvidae

Nuclear gene tree

COM ancestor

Chloroplast gene tree Mitochondrial gene tree

Malvidae ancestor ♀

Deep phylogenetic incongruence in the angiosperm clade Rosidae.

Analysis of large data sets can help resolve difficult nodes in the tree of life and also reveal complex evolutionary histories. The placement of the ...
840KB Sizes 4 Downloads 8 Views