YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx 1

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev 6 7 3 4 5 8 9 10 11 12 13 14 16 15 17 1 3 9 3 20 21 22 23 24 25 26 27 28 29 30 31 32

Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera) q Serena E. Dool a,b,1, Sebastien J. Puechmaille b,c,d,⇑,1, Nicole M. Foley d, Benjamin Allegrini e, Anna Bastian a, Gregory L. Mutumi a, Tinyiko G. Maluleke a, Lizelle J. Odendaal a, Emma C. Teeling d, David S. Jacobs a a

Department of Biological Sciences, Animal Evolution and Systematics Group, University of Cape Town, Cape Town, South Africa Zoological Institute and Museum, University of Greifswald, Soldmann-Straße 14, D-17487 Greifswald, Germany c Midi-Pyrénées bat group (CREN-GCMP), Toulouse, France d School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland e Naturalia environnement, 34670 Baillargues, France b

a r t i c l e

i n f o

Article history: Received 11 September 2015 Revised 7 January 2016 Accepted 8 January 2016 Available online xxxx Keywords: Multi-species coalescent Rhinolophus Topology/topological comparisons Ancestral state reconstruction Tree distance metrics Introgression

q

a b s t r a c t Despite many studies illustrating the perils of utilising mitochondrial DNA in phylogenetic studies, it remains one of the most widely used genetic markers for this purpose. Over the last decade, nuclear introns have been proposed as alternative markers for phylogenetic reconstruction. However, the resolution capabilities of mtDNA and nuclear introns have rarely been quantified and compared. In the current study we generated a novel 5 kb dataset comprising six nuclear introns and a mtDNA fragment. We assessed the relative resolution capabilities of the six intronic fragments with respect to each other, when used in various combinations together, and when compared to the traditionally used mtDNA. We focused on a major clade in the horseshoe bat family (Afro-Palaearctic clade; Rhinolophidae) as our case study. This old, widely distributed and speciose group contains a high level of conserved morphology. This morphological stasis renders the reconstruction of the phylogeny of this group with traditional morphological characters complex. We sampled multiple individuals per species to represent their geographic distributions as best as possible (122 individuals, 24 species, 68 localities). We reconstructed the species phylogeny using several complementary methods (partitioned Maximum Likelihood and Bayesian and Bayesian multispecies-coalescent) and made inferences based on consensus across these methods. We computed pairwise comparisons based on Robinson–Foulds tree distance metric between all Bayesian topologies generated (27,000) for every gene(s) and visualised the tree space using multidimensional scaling (MDS) plots. Using our supported species phylogeny we estimated the ancestral state of key traits of interest within this group, e.g. echolocation peak frequency which has been implicated in speciation. Our results revealed many potential cryptic species within this group, even in taxa where this was not suspected a priori and also found evidence for mtDNA introgression. We demonstrated that by using just two introns one can recover a better supported species tree than when using the mtDNA alone, despite the shorter overall length of the combined introns. Additionally, when combining any single intron with mtDNA, we showed that the result is highly similar to the mtDNA gene tree and far from the true species tree and therefore this approach should be avoided. We caution against the indiscriminate use of mtDNA in phylogenetic studies and advocate for pilot studies to select nuclear introns. The selection of marker type and number is a crucial step that is best based on critical examination of preliminary or previously published data. Based on our findings and previous publications, we recommend the following markers to recover phylogenetic relationships between recently diverged taxa (200. The three independent runs for each alignment were combined in LOGCOM-

254

v. 1.8 with 10% discarded as burn-in and a single consensus topology was generated (maximum clade credibility tree keeping target node heights) using TREEANNOTATOR v. 1.8. ML analyses were conducted using the RAXMLGUI v. 1.3 (Silvestro and Michalak, 2012) with 500 replicates of the thorough bootstrap method and the Asian Rhinolophus species specified as outgroup taxa. As only GTR models can be specified in RAxML, the best scoring GTR model for each alignment was used (based on AIC, jMODELTEST and following recommendations in Stamatakis, 2015). Trees were visualised using FIGTREE v. 1.4 (http://tree.bio.ed.ac.uk/software/figtree/). As unstable (rogue) taxa in a phylogeny can affect phylogenetic inference (Pattengale et al., 2010), we sought to identify such taxa in our dataset using the online version of RogueNaRok (at http:// 193.197.73.70:8080/rnr/roguenarok; Aberer et al., 2013). The 500 ML Bootstrap trees and the best ML tree from the six-nuclear intron partitioned analysis were supplied to search for taxa that, if pruned from the underlying bootstrap trees, would increase support values on the best-known tree. Optimised alignment lengths, models of substitution used, base composition and number of parsimony informative sites per region amplified are shown in Table S2.

279

2.5. Species delimitation and divergence time estimation

300

The partitioned analysis of genes for phylogenetic reconstruction does not explicitly model the relationship between the gene trees and species tree, potentially leading to over confident support for the incorrect species tree (Degnan and Rosenberg, 2009; Leache and Rannala, 2011). Therefore a rjMCMC Bayesian species delimitation method which incorporates a multispecies coalescent model (*BEAST, Heled and Drummond, 2010) was also employed to complement the ML and Bayesian analyses of the partitioned dataset described above. The *BEAST analysis was conducted using the nuclear dataset only. Taxa were assigned to species based on field identification or support for such clades in the Bayesian and ML analyses of the nuclear datasets. The *BEAST analysis was also repeated with any individuals which had appeared genetically divergent in the Bayesian and ML analyses (Section 2.4) given a distinct ‘species’ attribute to maximise the chance of finding support for potential cryptic taxa (species assignments for *BEAST for

301

BINER

Please cite this article in press as: Dool, S.E., et al. Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera). Mol. Phylogenet. Evol. (2016), http://dx.doi.org/10.1016/j.ympev.2016.01.003

255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278

280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299

302 303 304 305 306 307 308 309 310 311 312 313 314 315 316

YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 4

S.E. Dool et al. / Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx

mtDNA + Single Intron Topologies mtDNA + ABHD11 mtDNA + ACOX2 mtDNA + BGN mtDNA + COPS7A mtDNA + ROGDI mtDNA + STAT5A

4

mtDNA Topology

Single Nuclear Gene Topologies

2

ABHD11 ACOX2 BGN COPS7A ROGDI STAT5A

mtDNA (1647 bp)

1

6 intron par oned Bayesian topology aka “Reference”

3 Par al mtDNA Topologies

Nuclear Intron Combina ons BGN + COPS7A BGN + STAT5A BGN + ROGDI COPS7A + STAT5A ROGDI + COPS7A

4

2

CYTB (1103 bp) CYTB1 (551 bp) CYTB2 (552 bp) DLOOP (544 bp)

Fig. 1. Flowchart of topological comparisons made between phylogenetic datasets. The red lines indicate that the reference topology was pruned (if necessary) to contain only the taxa present in the gene(s) of interest; then the support (Bayesian Posterior Probability, BPP) from the gene(s) of interest for the reference topology was calculated. The blue lines represent data sets that were combined for multidimensional scaling plots of Robinson–Foulds distances; see Fig. 6A for ‘1’, see Fig. 6B for ‘2’, see Fig. 6C for ‘3’ and see Fig. 6D and S41–S45 for ‘4’. See the materials and methods Section 2.7 for additional details.

340

all individuals used in the current study are detailed in Table S1). The *BEAST analysis was run with default settings: Yule speciation process, piecewise linear population size and constant root, a strict clock, lognormal clock-rate priors, and random starting tree. Five independent runs of 20 million generations sampled every 500 were combined in LOGCOMBINER with the first 10% discarded as burn-in and convergence of chains confirmed in TRACER as above. A single consensus topology was generated as in the BEAST analysis. Divergence times of lineages within Rhinolophidae were estimated using the Bayesian nuclear partitioned topology and by constraining the age of the root to 16.92 Mya (million years ago; CI 15.07–18.74). This age of the root was estimated by Foley et al. (2015) based on the best available calibration method and dates currently available and is consistent with estimated dates obtained from the phylogenetic time tree of mammals (Meredith et al., 2011). Molecular dating analysis was conducted using the MCMCTREE program in PAML v. 4.8 (Yang, 2007; Yang and Rannala, 2006) using the HKY85 model of sequence evolution and independent rates clock model. The root age was specified using the CI dates above (15-07-18.74 Mya). Two independent runs each with 20,000 generations, sampling every second generation with 10% burn-in were performed. Convergence of the MCMC chains were confirmed by plotting the time estimates from each run against each other as recommended in the MCMCTree tutorial.

341

2.6. Choice of ‘‘Reference” topology

342

In the current study, six unlinked introns from the nuclear genome were amplified as well as part of the mitochondrial genome for all individuals. As we are primarily interested in the true species tree, rather than any gene trees, we use the partitioned six intron topology as our ‘reference’ topology; as such is expected to best represent the true species tree (Nichols, 2001). It is now well established that each marker considered alone (irrespective of marker type) can only produce a gene-tree which may or may not reflect the true species tree for several reasons: incomplete lineage sorting, horizontal transfer, gene duplication, gene extinction, recombination, natural selection, nonindependence of loci and population substructure in the ancestral species (Degnan and Rosenberg, 2006; Maddison, 1997; Nichols, 2001; Pamilo and Nei, 1988). That being said, the six introns share

317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339

343 344 345 346 347 348 349 350 351 352 353 354 355

the same mode of inheritance (i.e. bi-parental) and broadly represent independently evolving regions from the nuclear genome; hence their combined information is expected to be representative of the species tree, though as in every phylogenetic study, it is challenging to know with certainty if the true species tree has been fully recovered. mtDNA is known to come with even more caveats than nuclear loci (e.g. introgression and being representative of one gender [females in mammals]), briefly mentioned herein (cf. introduction & Section 4.4) but presented in detail in several papers and reviews (e.g. Ballard and Rand, 2005; Ballard and Whitlock, 2004; Balloux, 2010; Dávalos et al., 2012; Edwards and Bensch, 2009; Edwards et al., 2005; Funk and Omland, 2003; Galtier et al., 2009; Mallet, 2005; Toews and Brelsford, 2012) and therefore was not used to infer the species topology in this study. We selected the 6-nuclear intron Bayesian topology as the reference as it was congruent with the Maximum Likelihood (RAXML) and multi-species coalescent (*BEAST) topologies except in basal relationships which were globally unresolved. Therefore, any of the 6-nuclear dataset topologies would have done equally well. Whilst mitochondrial DNA may have many parsimony informative (PI) sites, it is not obvious whether such ‘information’ reflects the species tree or simply a gene tree (see Galtier et al., 2009 for a review of these issues). Mitochondrial markers are often combined with more slowly evolving loci (e.g. introns & exons; which usually contain far fewer PI sites but yet may better reflect the species tree). The resulting tree of such analyses is often considered as reflecting the species tree rather than a gene tree, although most of the signal often comes from the mitochondrial DNA and hence the resulting topology is little different from a mtDNA tree. For these reasons, we chose not to join mitochondrial and nuclear markers into a single analysis for our reference topology. We argue, and illustrate with empirical data, that considering the incongruences between such datasets brings more information about the species’ history than the does the outcome of joining such markers.

356

2.7. Topological comparisons

390

Topological comparisons were evaluated using two approaches (summarised in Fig. 1). First, we evaluated the nodal Bayesian Posterior Probability (BPP) from each gene to support the topology of the full nuclear dataset of six introns (henceforth, the ‘‘reference”

391

Please cite this article in press as: Dool, S.E., et al. Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera). Mol. Phylogenet. Evol. (2016), http://dx.doi.org/10.1016/j.ympev.2016.01.003

357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389

392 393 394

YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 S.E. Dool et al. / Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 460 458 461 459

tree/topology) based on the BEAST analysis (Fig. 1 in red). This analysis was also repeated for the six topologies made using a combination of 5 of the introns (Table S3). Secondly, we calculated the Robinson–Foulds (RF) distance between tree topologies from each gene and a variety of gene combinations, including the reference topology (Fig. 1 in blue) and used multidimensional scaling (MDS) to visualise the relationship between such trees as proposed by Hillis et al. (2005). Phylogenetic comparisons based on topology, Bayesian Posterior Probability (BPP), and Robinson–Foulds (RF) distances were made using R v.3.1.1 (R Development Core Team, 2014) and the packages: ape (v. 3.1-4, Paradis et al., 2004), phyloch (v. 1.5-5), adephylo (v. 1.1-6, Jombart et al., 2010) and phangorn (v. 1.99-7, Schliep, 2011). More specifically, for the first approach, we pruned the reference topology to include only the taxa present in the gene (s) involved in the comparison using the drop.tip function in the ape package. This was necessary to allow direct comparison between datasets as each nuclear intron (except ABHD11) contained one missing taxon relative to the reference dataset. Support (BPP) for the reference tree topology was assessed from the 27,000 trees available per gene (Bayesian analysis, see Section 2.4). The following summary statistics were calculated: proportion of nodes better supported by the reference dataset, average and standard deviation of BPP difference between the reference and the gene datasets, and proportion of nodes equally supported or better supported by the gene dataset. For the second approach of topological comparisons involving Robinson–Foulds distances, trees were unrooted and all branches were scaled such that the total tree length = 1. All datasets included in a comparison were combined into a single file: the consensus topology (N = 1) of each dataset with a subset (N = 100) of topologies from the 27,000 Bayesian trees available for that dataset (every 270th tree was selected). This resulted in a single file with 101 topologies per dataset being compared. The pairwise RF distance between each of these trees was calculated using the RF.dist function in phangorn. The resulting large matrix of distances between topologies was then visualised via an MDS plot with two dimensions created to minimise the distortions of the observed matrix via minimising the stress function (Kruskal, 1964). The MDS was run using the bestnmds function in the labdsv package with two dimensions retained, a maximum of 200 iterations and 200 random starts to find the lowest stress value. A matrix of RF distance and MDS plot were carried out on four datasets detailed hereafter. (1) The first dataset contained each nuclear intron and the 6-nuclear intron partitioned dataset to assess the performance and similarity of each nuclear intron when compared to each other and to the reference dataset. (2) The second dataset contained each nuclear intron along with five combinations of two introns (based on preliminary results suggesting which introns could best support the reference dataset considering BPP support and RF distances) and the 6-nuclear intron partitioned dataset. The aim was to investigate the performance of each nuclear intron and combinations of introns when compared to the reference dataset. (3) The third dataset included regions/subsets of the mtDNA (the first half of cytb, second half of cytb and tRNAs plus D-loop), and the full mtDNA to investigate the relative contribution of each region of mtDNA to the full mtDNA signal. (4) The fourth dataset consisted of each nuclear intron alone compared with itself when combined with the mtDNA versus the mtDNA alone. This was done in turn for each of the six introns to assess the similarity of the nuclear intron + mtDNA versus either part (i.e. intron, mtDNA) taken alone.

5

2.8. Phylogenetic signal and ancestral state reconstruction

462

Our aim was to reconstruct the ancestral state for traits of interest (peak frequency, forearm length and weight). Such reconstruction is only meaningful if the evolution of the trait is influenced by the phylogeny. Hence we tested for phylogenetic signal by assessing the tendency of related species/individuals to resemble each other more than species/individuals drawn at random from the tree using two commonly calculated indices, Pagel’s k and Blomberg’s K (Blomberg et al., 2003; Pagel, 1999). Both indices assume a Brownian motion (BM) model of trait evolution with values close to zero indicating phylogenetic independence and values close to one indicating that species traits follow expectations under the BM model. All calculations were performed in R. We estimated Pagel’s k with the function ‘phylosig’ (package phytools, Revell, 2012) and Blomberg’s K with the function ‘phylosignal’ (package picante, Kembel et al., 2010). Significance of Pagel’s k was assessed via a likelihood ratio test comparing the log-likelihood of the ML estimate of lambda and lambda = 0. Significance of Blomberg’s K was assessed by 9999 randomizations. Tests were carried out in three different ways; (i) the tree topology (and branch length) of the reference dataset was used. As the inclusion of more than one individual per species would affect Pagel’s k and Blomberg’s K, we ran the analyses 10,000 times, each time randomly keeping only one individual per species. (ii) To take into consideration the uncertainty in branch length and tree topology, we also ran the analyses for each of the 27,000 trees produced by the Bayesian analysis for the reference dataset, each time keeping only one randomly picked individual per species. (iii) Finally, we ran analyses as described in (ii) but also considering the typical within species variability for the values of interest (echolocation peak frequency, forearm length and body weight). For each species, the value of the trait was drawn from a normal distribution with parameters of the distribution (mean and sd) estimated from published and field data (Table S4). The ancestral character value for peak frequency and forearm of the Rhinolophus ancestor was estimated by fitting the data to a BM model using the residual maximum likelihood method as implemented in the ‘ace’ function (ape package, Paradis et al., 2004). Estimations were carried out with all individuals per species in two different ways: (i) the tree topology (and branch length) of the combined 6 nuclear genes and the observed peak frequency and forearm were used; (ii) to take into consideration the uncertainty in branch length and tree topology, we also ran the analyses for each of the 27,000 trees produced by the Bayesian analysis (for the 6-nuclear gene partitioned analysis), each time drawing the peak frequency and forearm from normal distributions as detailed for phylogenetic signal analyses.

463

3. Results

509

There was broad agreement between analysis methods (ML, Bayesian & multispecies coalescent Bayesian) for each of the two different datasets (partitioned 6-nuclear genes, mtDNA; Figs. 2, 3 and S1–S3; cf. Supplementary Text, section 3 for a detailed presentation of all phylogenetic relationships). Irrespective of the analysis method or the dataset used, there are two supported clades, an Afro-Palearctic clade (sometimes excluding R. hipposideros) and an Asian clade (sometimes excluding R. luctus/R. trifoliatus). However, there were topological disagreements between the partitioned 6-nuclear genes and mitochondrial trees. The first main area of discordance between datasets and analysis methods concerned the basal arrangements of taxa. In particular, the placement of R. hipposideros was highly variable. The alternative basal arrangements of taxa were as follows

510

Please cite this article in press as: Dool, S.E., et al. Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera). Mol. Phylogenet. Evol. (2016), http://dx.doi.org/10.1016/j.ympev.2016.01.003

464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508

511 512 513 514 515 516 517 518 519 520 521 522 523

YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 6

S.E. Dool et al. / Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx Rtri THA KKW 122 Rluc THA KSO 99 Rfer BGR GAB 46 Rfer FRA SEV 47 Rfer TUN ZAG 49 Rfer IRN ROL 48 Rcli DZA DJA 17 Rcli DZA DJA 16 Rcli ZAF GHS 21 0.78/0.41 Rcli BWA LOB 14 Rcli BWA LOB 15 Rcli ZAF BDM 19 Rcli ZAF BDM 18 Rcli ZWE MTP 27 Rcli ZAF DHL 20 Rcli ZWE MNM 26 Rcli ZAF KGB 22 Rcli ZAF KSM 23 0.96/0.45 Rcli ZAF PCC 24 Rcli ZWE CHK 25 0.77/0.42 Rdam ZAF SFN 32 Rdam NAM DAN 29 Rdam ZAF ORR 31 Rdam NAM MAC 30 Rdam ZAF UFN 33 Rhil ZWE MPC 88 Rhil ZWE KPE 87 Rhil ZAF EHO 80 Rhil ZAF BBT 78 0.69/0.67 Rhil ZAF BBT 79 Rhil ZWE MWB 91 Rhil ZWE KPE 86 Rhil ZWE MWB 92 Rhil ZWE MTP 89 Rhil ZWE MUS 90 Rhil ZAF ELH 81 0.56/0.4 Rfum ZWE JET 72 Rfum ZWE CHK 71 Rfum ZWE MWB 77 Rfum ZWE MWB 76 Rfum ZWE CHK 70 Rfum ZWE MPC 74 Rfum ZWE MPC 75 Rfum ZWE MRB 73 Rmos ZMB KAL 82 Rfum NAM LUD 61 Rfum NAM DAN 59 0.97/0.55 Rfum NAM DAN 60 Rfum NAM DAN 58 0.95/ Rfum NAM NOO 64 0.51 Rfum NAM NAO 63 Rfum NAM NOO 66 Rfum NAM LUD 62 Rfum NAM NOO 65 Rmos ZWE CHC 84 Rmos ZMB LPH 83 Rmos ZWE CHC 85 Rfum SEN FAD 67 0.87/ Rfum MLI DIO 57 0.42 Rfum SEN FAD 69 Rfum SEN FAD 68 Rfum GIN SAM 56 Rfum CAF BAN 54 Rfum CAF BAN 53 Rfum CAF BAN 52 Rfum CAF BAN 50 Rfum CAF BAN 55 0.84/0.39 Rfum CAF BAN 51 Rdar ZAF ELH 36 Rdar ZAF EHO 35 Rdar ZWE MUS 39 Rdar ZAF EHO 34 0.53/0.37 Rdar ZWE MTP 38 Rdar ZAF SUD 37 Rsim ZMB SHC 110 Rsim ZAF ELH 106 Rswi ZAF ELH 114 Rsim ZMB KAL 109 Rsim ZAF KGB 108 Rsim ZWE CHK 111 Rsim BWA LOB 105 Rsim ZAF ELH 107 0.88/0.31 Rsim ZWE MAB 112 Rswi ZMB KAL 121 Rswi ZAF PAC 116 Rden NAM LUD 41 Rden ZAF KGB 43 Rden NAM GHB 40 Rden ZAF PJG 44 Rden NAM NOO 42 Rcap ZAF TBF 11 Rcap ZAF LKS 09 Rcap ZAF ZPK 13 Rcap ZAF ZPK 12 Rcap ZAF HKV 10 Rswi ZAF SPF 120 Rswi ZAF SPF 118 Rswi ZAF SPF 117 Rswi ZAF KSM 115 Rswi ZAF SPF 119 Rmeh TUN HAW 101 Rmeh TUN HAW 100 Reur TUN GHA 45 Rbla ZMB KAL 06 Rbla IRN TON 03 Rbla DZA SIG 02 Rbla ZWE GRM 08 Rbla ZAF GKC 05 Rbla ZWE CHK 07 Rbla MWI MTM 04 Rlan MLI SAN 96 Rlan MLI SAN 95 Ralc CAF SMB 01 Rlan ZWE MPC 98 Rlan ZWE MPC 97 Rhip FRA SBP 94 Rhip BGR BEA 93 Rpea THA MJC 102 Rsha VNM PQP 104 Rcre UNK UNK 28 Rpus VNM CTP 103 Rtho VNM CTP 113

*

0.94/0.67

*

*

trifoliatus-group

Rfer

*

*

ferrumequinum-group

Rcli

R. trifoliatus clade

*

*

*

*

Rdam

*

Rhil

*

*

*

*

*

*

*

*

*

* *

*

* *

0.62/0.89

0.24/1

*

*

Miocene

7.5

5

Rsim R cf. sim capensis-group Rden

*

Rcap

*

Rswi Rmeh Reur

euryale-group

Rbla Rlan Ralc Rlan Rhip

landeri-group hipposideros-group

Pleistocene

Pliocene

10

Rdar

Asian clade

*

Rfum

* *

* ** * * * * * * * * *

*

*

12.5

*

*

*

15

*

*

*

fumigatus-group

R cf. mos Rfum

A fr o - P a la e a r c t ic c la d e

*

*

*

Rfum

2.5

0

Fig. 2. Bayesian topology of Rhinolophus spp. inferred in BEAST based on the partitioned nuclear dataset (6 introns; 3406 bp) with support from Bayesian (BPP) and ML (BSS) indicated at the nodes. ⁄ Denotes nodes which were supported >0.99 BPP and 99% BSS. The species, species-groups and clades discussed in the text are highlighted by vertical bars or boxes. The time scale is simply indicative (cf. Fig. S5 for precise date estimates).

524 525 526 527 528 529 530 531 532

(1) The Afro-Palaearctic and Asian clades are monophyletic and sister to each other (nuclear *BEAST; Fig. S3). (2) R. luctus/R. trifoliatus form the basal clade, followed by a second Asian clade (comprising all remaining Asian taxa) sister to the Afro-Palearctic clade, the latter with R. hipposideros basal (nuclear dataset, Bayesian & ML; Figs. 2 and S1). (3) R. hipposideros/R. luctus form the basal clade, followed by a second Asian clade (comprising all remaining Asian taxa) sister to the Afro-Palearctic clade (mtDNA, Bayesian; Fig. 3).

(4) R. hipposideros is sister to R. luctus within the Asian clade, which is then sister to the Afro-Palearctic clade (mtDNA, ML; Fig. S2).

533 534 535 536

A second area of discordance involved relationships within the fumigatus species group (R. fumigatus, R. cf. mossambicus, R. hildebrandtii, R. darlingi, R. damarensis). Species, or geographic clades within species were highly supported and (most of the time) reciprocally monophyletic, however the relationships between

Please cite this article in press as: Dool, S.E., et al. Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera). Mol. Phylogenet. Evol. (2016), http://dx.doi.org/10.1016/j.ympev.2016.01.003

537 538 539 540 541

YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 7

S.E. Dool et al. / Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx

*

0.99/0.77

*

* ** * *** * * 0.64/0.43 * * * * *

* * *

0.9/0.46

*

0.67/0.43

0.83/0.52

* 0.58/0.29

*

*

*

0.96/0.48

* * * * * * * * * *

* 0.97/0.73 * *

*

* * *

*

0.57/0.72

*

*

*

0.52/

*

*

*

*

*

** 0.89/0.43 * * *

*

*

*

0.99/0.71

* *

*

Rcap Rswi Rsim+ R cf. sim

capensis-group

Rden Rfum

Rfum Rfum R cf. mos Rfum

fumigatus-group

Rhil

Rdar Rdam Rfer+ Rcli Rcli

ferrumequinum-group

Rcli Rmeh Reur

euryale-group

Rbla Ralc Rlan

landeri-group

Asian clade

*

* * *

hipposideros-group trifoliatus-group

Rhip

A fr o - P a la e a r c t ic c la d e

* * * * * * * * *

*

Rhip BGR BEA 93 Rhip FRA SBP 94 Rluc THA KSO 99 Rcap ZAF HKV 10 Rcap ZAF LKS 09 Rcap ZAF TBF 11 Rcap ZAF ZPK 13 Rcap ZAF ZPK 12 Rswi ZAF KSM 115 Rswi ZAF SPF 117 Rswi ZAF SPF 119 Rswi ZAF SPF 118 Rswi ZAF SPF 120 Rsim ZWE CHK 111 Rsim ZWE MAB 112 Rswi ZMB KAL 121 Rsim ZMB SHC 110 Rswi ZAF PAC 116 Rsim BWA LOB 105 Rsim ZAF KGB 108 Rswi ZAF ELH 114 Rsim ZAF ELH 106 Rsim ZAF ELH 107 Rsim ZMB KAL 109 Rden NAM NOO 42 Rden NAM LUD 41 Rden NAM GHB 40 Rden ZAF PJG 44 Rden ZAF KGB 43 Rfum NAM DAN 60 Rfum NAM DAN 59 Rfum NAM DAN 58 Rfum NAM LUD 62 Rfum NAM LUD 61 Rfum NAM NOO 64 Rfum NAM NOO 65 Rfum NAM NOO 66 Rfum NAM NAO 63 Rfum CAF BAN 54 Rfum CAF BAN 51 Rfum CAF BAN 55 Rfum CAF BAN 53 Rfum CAF BAN 52 Rfum CAF BAN 50 Rfum SEN FAD 69 Rfum SEN FAD 68 Rfum SEN FAD 67 Rfum MLI DIO 57 Rfum GIN SAM 56 Rmos ZMB KAL 82 Rmos ZMB LPH 83 Rmos ZWE CHC 85 Rmos ZWE CHC 84 Rfum ZWE MWB 77 Rfum ZWE JET 72 Rfum ZWE MWB 76 Rfum ZWE MRB 73 Rfum ZWE CHK 71 Rfum ZWE MPC 74 Rfum ZWE MPC 75 Rfum ZWE CHK 70 Rhil ZWE MPC 88 Rhil ZAF ELH 81 Rhil ZAF EHO 80 Rhil ZAF BBT 78 Rhil ZWE MWB 91 Rhil ZWE MWB 92 Rhil ZWE MUS 90 Rhil ZWE MTP 89 Rhil ZAF BBT 79 Rhil ZWE KPE 87 Rhil ZWE KPE 86 Rdar ZWE MUS 39 Rdar ZWE MTP 38 Rdar ZAF SUD 37 Rdar ZAF EHO 34 Rdar ZAF ELH 36 Rdar ZAF EHO 35 Rdam ZAF UFN 33 Rdam ZAF SFN 32 Rdam NAM MAC 30 Rdam ZAF ORR 31 Rdam NAM DAN 29 Rfer IRN ROL 48 Rcli DZA DJA 17 Rcli DZA DJA 16 Rfer TUN ZAG 49 Rfer BGR GAB 46 Rcli ZAF KSM 23 Rcli ZAF KGB 22 Rcli ZAF DHL 20 Rcli ZAF GHS 21 Rcli ZAF BDM 19 Rcli ZAF BDM 18 Rcli BWA LOB 15 Rcli BWA LOB 14 Rcli ZWE MNM 26 Rcli ZAF PCC 24 Rcli ZWE MTP 27 Rcli ZWE CHK 25 Rmeh TUN HAW 101 Rmeh TUN HAW 100 Reur TUN GHA 45 Rbla ZWE GRM 08 Rbla ZAF GKC 05 Rbla ZWE CHK 07 Rbla ZMB KAL 06 Rbla MWI MTM 04 Rbla IRN TON 03 Rbla DZA SIG 02 Ralc CAF SMB 01 Rlan MLI SAN 96 Rlan ZWE MPC 98 Rlan ZWE MPC 97 Rpus VNM CTP 103 Rpea THA MJC 102

Fig. 3. Bayesian topology of Rhinolophus spp. inferred in BEAST based on the partitioned mtDNA dataset (1647 bp) with support from Bayesian (BPP) and ML (BSS) indicated at the nodes. ⁄ Denotes nodes which were supported >0.99 BPP and 99% BSS. The species, species-groups and clades discussed in the text are highlighted by vertical bars or boxes.

542 543 544 545 546 547 548 549 550 551 552 553

species/geographic clades were at times poorly resolved and differed between datasets and analysis methods (Figs. 2, 3 and S1–S4). Notwithstanding this, our results based on nuclear data consistently suggest that R. fumigatus is paraphyletic (with respect to the placement of R. cf. mossambicus); R. fumigatus individuals from Namibia (Figs. 2 and S3, nuclear dataset) having closer affinities to R. cf. mossambicus than to other fumigatus clades. A third area of disagreement was for the R. clivosus/R. ferrumequinum relationship between the nuclear and mtDNA datasets. Based on the mtDNA dataset, R. clivosus individuals form three well defined geographic clades from Algeria, South Africa and Northern South Africa/Zimbabwe/Botswana. The clade from Algeria is nested

within R. ferrumequinum and renders R. clivosus paraphyletic (Figs. 3 and S2). In contrast, the nuclear dataset (Figs. 2 and S1) places the individuals from Algeria basal within a monophyletic R. clivosus clade, sister to a monophyletic R. ferrumequinum clade, suggesting mitochondrial introgression between these two species (Fig. 4). Ten individuals were identified as potential ‘rogue’ taxa using the ML six-nuclear partitioned dataset. Their removal did not affect the topology at all or nodal supports overall. It did increase support for a few species groups or for geographic clades within species (i.e. recent nodes). The rogue taxa and the effect of their removal are indicated in red in Fig. S1.

Please cite this article in press as: Dool, S.E., et al. Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera). Mol. Phylogenet. Evol. (2016), http://dx.doi.org/10.1016/j.ympev.2016.01.003

554 555 556 557 558 559 560 561 562 563 564 565

YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 8

S.E. Dool et al. / Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx

Rfer_IRN_ROL_48

*

*

*

*

Rfer_IRN_ROL_48

Rfer_BGR_GAB_46

Rfer_BGR_GAB_46

Rfer_TUN_ZAG_49

Rfer_TUN_ZAG_49

Rcli_DZA_DJA_17

Rcli_DZA_DJA_17

Rcli_DZA_DJA_16

Rcli_DZA_DJA_16

Rcli_ZAF_GHS_21

Rcli_ZAF_GHS_21

Rcli_BWA_LOB_14

Rcli_ZAF_BDM_19

Rcli_BWA_LOB_15

Rcli_ZAF_BDM_18

Rcli_ZAF_BDM_19

Rcli_ZAF_DHL_20

Rcli_ZAF_BDM_18

Rcli_ZAF_KGB_22

Rcli_ZWE_MTP_27

Rcli_ZAF_KSM_23

Rcli_ZAF_DHL_20

Rcli_BWA_LOB_14

Rcli_ZWE_MNM_26

Rcli_BWA_LOB_15

Rcli_ZAF_KGB_22

Rcli_ZWE_MNM_26

Rcli_ZAF_KSM_23

Rcli_ZWE_MTP_27

Rcli_ZAF_PCC_24

Rcli_ZAF_PCC_24

Rcli_ZWE_CHK_25

Rcli_ZWE_CHK_25

*

*

*

*

Fig. 4. Subsection of the two Bayesian topologies shown in Figs. 2 and 3 (6 nuclear introns, Left; mtDNA, Right) illustrating the relationships between individuals of R. ferrumequinum and R. clivosus, highlighting a case of mtDNA introgression between these species. The tree also shows that within R. clivosus, well defined mtDNA clades are not necessarily recovered at the nuclear DNA level.

566 567 568 569

The estimated dates of divergence from our phylogenetic analyses indicated that most species in the African Rhinolophus lineages diverged from their closest relative within the last 6 million years (Figs. 2 and S5).

570

3.1. Topological comparisons and individual gene performance

571

Whilst each individual nuclear marker was capable of recovering most species groups in addition to some highly supported nodes for relationships between species, no nuclear marker produced a highly resolved phylogeny at the majority of nodes when considered alone (Figs. S6–S17). The mtDNA produced a highly resolved gene tree, including supported geographic lineages within species (Fig. 3). When individual nuclear gene trees were forced to the reference topology (partitioned 6-nuclear genes) and nodal supports compared, it was clear that the reference tree outperformed each gene alone in at least 84% of nodes (Figs. S18–S23 and S40). The overall trend remained unchanged when the analysis was repeated excluding the gene of interest in the reference tree (Figs. S29–S34 and S40). On average, the nodal support of the reference tree was 0.41–0.54 above the nodal support for individual gene trees, particularly for recent nodes (less than 1–2 Mya; Fig. 5A). A comparison of the respective performance of individual nuclear genes indicated that ABHD11 seemed to perform worse than any other while COPS7A and ACOX2 performed best (Fig. 5). The visualisation of RF distances using an MDS plot showed that each nuclear gene globally occupies a discrete tree space suggesting each provides unique and complementary information (Fig. 6A). Further, combinations of two of the ‘best’ performing nuclear genes (Figs. S35– S39) were consistently closer in tree space to the reference topology and to other intronic combinations than any individual gene (Fig. 6B). Although the mitochondrial fragment was nearly three times greater in length than the average intron (1647 bp vs 568 bp

572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598

respectively), it did not perform markedly better at supporting the reference topology, irrespective of what metric this was based on (Figs. 5, S24 and S40). The reference tree outperformed the mtDNA tree in 69% of nodes (Figs. S24 and S40) and had an overall average nodal support higher by 0.38 BPP. It is noteworthy that the standard deviation of the difference in BPP of the reference versus each gene tree was highest for the mtDNA relative to the individual nuclear genes, indicating greater topological conflict between the mtDNA and the reference topology (Fig. S40). To investigate if different regions of the mtDNA fragment were equally informative, the mtDNA alignment was equally divided into three fragments described above (Section 2.4). Relative to the D-loop, both cytb fragments (CYTB1 and CYTB2) and cytb as a whole (CYTB) were less informative (Fig. 5) hence the signal from the mitochondrial fragment was mainly driven by the D-loop (Figs. 5 and 6C). When each of the mtDNA fragments were forced to the reference topology (Figs. S25–S28), the cytb fragments performed worse than any single nuclear gene (bar ABHD11) when considering nodal support (Fig. 5) while the D-loop performed comparably to the average single nuclear gene. When each intron in turn was combined with the full mtDNA fragment, it was obvious that the signal from such a combination was divergent from any of the individual introns and conversely, little different from the mtDNA signal considered alone (Figs. 6D and S41–S45). Indeed, the tree space occupied by each intron combined with the full mtDNA fragment was very close and/or largely overlapped with the tree space occupied by the mtDNA fragment only (Figs. 6D and S41–S45).

599

3.2. Ancestral state reconstruction

627

Of the traits considered in our study, only peak frequency and to a lesser extent forearm length, but not body weight, carried phylogenetic signal (Table 1). The estimation of the rhinolophid ancestral state with the reference topology gave a peak frequency

628

Please cite this article in press as: Dool, S.E., et al. Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera). Mol. Phylogenet. Evol. (2016), http://dx.doi.org/10.1016/j.ympev.2016.01.003

600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626

629 630 631

YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 S.E. Dool et al. / Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx

9

Fig. 5. Analyses of the Bayesian Posterior Probability (BPP) of different data sets in support of the reference topology (datasets forced to the reference topology). Values displayed represent the (A) difference between the reference BPP and the dataset BPP and (B) the value represented in (A) multiplied by the node height. For (A) and (B), the Y-axis corresponds to the node’s minimum age (e.g. the value represented at x = 6 corresponds to the difference between the reference BPP and the dataset BPP for nodes older than 6 Mya). Nuclear genes are represented as solid lines while mtDNA fragments are represented as dashed lines. Nodes older than 9 Mya were not included in the graph as they are too few.

Fig. 6. Multidimensional scaling plot representing the tree space occupied by the different sets of genes or gene combinations sampled in Bayesian analyses. (A) Each nuclear intron along with the partitioned six nuclear introns (=reference topology, Ref.) (stress = 27%), (B) each nuclear intron along with five combinations of the best two introns and the partitioned six nuclear introns (Ref.); [1] = BGN + COPS, [2] = BGN + STAT5, [3] = BGN + ROGDI, [4] = COPS + STAT5, [5] = ROGDI + COPS (stress = 24%), (C) the full mtDNA fragment, the first and second half of cytb (CYTB1 & CYTB2) and tRNAs plus D-loop (DLOOP) (stress = 18%), (D) nuclear intron COPS on its own and then combined with the mtDNA along with the mtDNA alone (stress = 7%). Within each panel, trees coming from the same dataset are depicted with the same colour and the large filled dot with a black edge represents the consensus tree of the dataset.

Please cite this article in press as: Dool, S.E., et al. Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera). Mol. Phylogenet. Evol. (2016), http://dx.doi.org/10.1016/j.ympev.2016.01.003

YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 10

S.E. Dool et al. / Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx

Table 1 Results of the test for phylogenetic signal (for resting peak frequency, forearm length and body weight) using two commonly calculated indices, Pagel’s k and Blomberg’s K (see text for details on the statistics and Schemes). The indices and significance were tested for a single tree (reference topology) or for the 27,000 trees produced by the Bayesian analysis for the reference dataset. The data used was either measured or simulated from a normal distribution with parameters of the distribution estimated from published and field data (see text for further details). N tree

1 27,000 27,000

632 633 634 635 636 637

Data

Real Real Simulated

Scheme

(i) (ii) (iii)

Peak frequency

Forearm

P-val

k

P-val

K

P-val

k

P-val

K

P-val

k

P-val

0.55 0.53 0.53

0.001 0.002 0.002

0.84 0.85 0.85

0.006 0.007 0.007

0.42 0.41 0.39

0.038 0.049 0.067

0.61 0.60 0.51

0.160 0.203 0.377

0.38 0.35 0.37

0.125 0.172 0.114

0.48 0.43 0.43

0.318 0.462 0.454

value of 77.17 kHz (IC95%: 72.56–81.78) and a forearm length of 50.16 mm (IC95%: 45.56–54.78) (Figs. 7 and S46) while the estimates of 83.09 kHz (IC95%: 73.40–91.61) and 47.71 mm (IC95%: 41.70–54.03) were obtained when considering uncertainty in the tree topology and between individual variation in the trait values.

638

3.3. Cryptic species

639

Out of the 12 Rhinolophus species for which we sampled more than 2 individuals, we detected the presence of eight potential cryptic species (cf. Supplementary Text, section 3 for details). Among these, three currently recognised species appear to be paraphyletic (R. fumigatus, R. landeri, R. swinnyi).

640 641 642 643

Body weight

K

4. Discussion

644

4.1. Phylogenetic reconstruction and species relationships within Rhinolophus

645

Results were generally consistent across datasets and analytical approaches. In areas of topological differences between mtDNA and the nuclear dataset, or ML versus Bayesian and multi-species coalescent approaches, statistical support tended to be low for the alternative scenarios e.g. relationships at the base of the tree, or relationships within the R. fumigatus complex. A notable exception to this was the relationship between R. ferrumequinum/R. clivosus, where opposing topologies were strongly supported in both mtDNA and nuclear datasets. All analyses provided support for the ‘AfroPalearctic’ clade, whilst Asian clade taxa were placed either in a single monophyletic clade sister to the Afro-Palearctic or as two clades

647

Fig. 7. Ancestral state reconstruction of echolocation peak frequency (kHz) throughout the horseshoe bat family (Rhinolophidae). The ancestor of all extant Rhinolophidae is inferred to have echolocated at 77.17 kHz (IC95%: 72.56–81.78). The partitioned nuclear Bayesian topology (=Reference topology; Fig. 2) was used for this analysis. Yellow circles at the tips are proportional to the peak frequency of the individual while the colours on the tree are varying along a gradient from low frequency calls in blue to high frequency calls in red (cf. gradient scale on the graph).

Please cite this article in press as: Dool, S.E., et al. Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera). Mol. Phylogenet. Evol. (2016), http://dx.doi.org/10.1016/j.ympev.2016.01.003

646

648 649 650 651 652 653 654 655 656 657

YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 S.E. Dool et al. / Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723

basal to the Afro-Palearctic clade in broad agreement with previous phylogenies on this taxonomic group (Bogdanowicz, 1992; Bogdanowicz and Owen, 1992; Qumsiyeh et al., 1988). Although not formally tested here, these results suggest a south east Asian origin for the family in agreement with Bogdanowicz and Owen (1992) and congruent with results found for the most speciose genus of bats Myotis (Ruedi et al., 2013). The position of R. hipposideros remains uncertain, although a basal position within the Afro-Palearctic clade seems most likely given our dataset (i.e. always recovered by analyses of the nuclear dataset). A recent study by Foley et al. (2015) used a larger phylogenetic dataset (10 kb of nuclear exons and introns) but also failed to unequivocally solve the placement of this species. Such lack of resolution may be caused by a very rapid diversification leading to short branch lengths at the base of the tree. Such relationships should be considered unresolved and will require either a different analytical approach or further study with larger datasets or different phylogenetic markers (e.g. retroposons, Murphy et al., 2007; Shedlock et al., 2004). Within our dataset, all five species groups previously recognised (R. capensis, R. fumigatus, R. ferrumequinum, R. euryale and R. landeri groups) were recovered with high support and many inter-species relationships were also highly supported (see Supplementary Text, section 3 for a detailed discussion addressing interspecific relationships and supported geographic lineages). The topology found in the current study is mostly concordant with the two most comprehensive published phylogenetic studies (Guillén-Servent et al., 2003; Stoffberg et al., 2010). However our results and those of Guillén-Servent et al. (2003) are dissimilar to those of Stoffberg et al. (2010) in two main respects: R. pearsoni was never recovered as the basal taxon in any of our analyses and R. mehelyi was not found as sister taxa to R. landeri, but rather to R. euryale and R. blasii. Although Stoffberg et al. (2010) used a multi-locus approach, only a single locus (THY, 484 bp) was sequenced successfully for R. pearsoni which likely did not allow the recovery of the species tree for the placement of this taxon but rather reflected a gene tree. An alternative but not mutually exclusive explanation would be that this single locus lacked phylogenetic signal, although the high support for its basal position (0.97 BPP) suggests this is unlikely. The relationships between taxa within the R. fumigatus species group differ among datasets and analytical approaches and are at times only weakly supported (Supplementary Text, section 3.5). As with many species herein included (cf. Supplementary Text, section 3), our genetic data strongly suggest that R. fumigatus is a species complex urgently requiring taxonomic revision. As successfully used in other species (e.g. Douangboubpha et al., in press; S.J. Puechmaille et al., 2014; Thong et al., 2012), a comprehensive sampling of this species group across its range and an integrative approach incorporating genetic, morphological, acoustic, and distributional data will be required to resolve the complex taxonomy of this group. A fully resolved taxonomy will greatly facilitate further phylogenetic investigations and enable meaningful comparisons between studies. Until such taxonomic issues are resolved it will not be possible to obtain the true species tree for this family. As many species are yet to be described, including many cryptic species (Baker and Bradley, 2006; Reeder et al., 2007), a substantial number of currently recognised species inevitably contains one or more species (as demonstrated in this study). When such ‘pooled’ species contain distantly related species (e.g. Soisook et al., 2015a; Thong et al., 2012), the phylogenetic position of a sample (attributed to the recognised species) will depend on which of these biological species has been sampled. Taking an example from the current study, if one was to sample R. swinnyi at the southern extremes of its range (in South Africa), it would be recovered as sister taxa to R. capensis (Figs. 2 and 3); if it was sampled in the middle of its range in Zimbabwe, it would be recovered as sister to R. simulator.

11

Although not our main goal, but not totally unexpected, our study additionally revealed that the sampling strategy could influence the recovered phylogeny as a result of unresolved taxonomy in the group studied, making interpretation of differences between studies difficult. Hence it would be judicious to make use of samples collected at or near the type locality of the taxa investigated, decreasing the chance of sampling undescribed cryptic species. Additionally, to better link phylogenies and taxonomy, it is important that information (e.g. means of identification used, exact geographic origin, specimen number if available) about samples used in phylogenetic/phylogenomic studies are available to ensure traceability of samples. Samples from voucher specimens with well-preserved DNA (ideally from a publicly accessible repository) should be preferentially used when possible. Phylogenetic reconstructions based on the concatenation of genes (supermatrix approach) has been criticised for potentially providing a poor estimation of the true species tree (Degnan and Rosenberg, 2006; Kubatko, 2007). The multi-species coalescent represents an important refinement in statistical phylogenetics which arguably better reflects biological reality by co-estimating the species tree and all gene trees in one Bayesian MCMC analysis. However, this method also comes with some drawbacks; e.g. the user must a priori assign individuals to species. This latter point is not a trivial issue in recently radiated and morphologically cryptic groups (as generally encountered across the Rhinolophus genus, Csorba et al., 2003) where taxonomic revisions are greatly needed. In the current study, we have used methods with complementary shortcomings and drew inferences when there was consensus across methods. Whilst most clades in the tree were recovered by all analysis there were some exceptions but only involving poorly supported nodes (discussed in the Supplementary Text, section 3), hence, results obtained by the different analyses were globally very similar for our dataset. Such a finding might not be too surprising given that the multi-species coalescent could be expected to provide better phylogenies for recent and especially rapid species radiations (as in the simulated and empirical datasets used by Heled and Drummond, 2010). Our dataset did not include large species-groups with rapid diversification with the possible exception of the R. fumigatus-group. In addition to the Bayesian partitioned analysis, we also analysed our nuclear (6 intron) dataset by concatenating all nuclear loci with a single model of evolution (i.e. non-partitioned; model selected by jModelTest results), this also did not result in well supported topological differences relative to the partitioned analysis (data not shown).

724

4.2. Divergence times

768

Most diversification occurred within the last 6 Mya with R. capensis and R. swinnyi being the youngest recognised taxa in the dataset. Irrespective of its position in the tree, Rhinolophus hipposideros represents the oldest lineage and diverged from other Rhinolophus species around 16 Mya. The dates of diversification agree with the most recent dating by Foley et al. (2015) (whose dating priors we followed), and also with those found by GuillénServent et al. (2003) who used a rather different dating method assuming a cytb molecular clock. Our dating estimates and hence also those of Foley et al. (2015) and Guillén-Servent et al. (2003) are however, vastly different to those found by Stoffberg et al. (2010) whose results suggested a most recent common ancestor between all extant species of Rhinolophus at 40.9 Mya (instead of 16–17 Mya in the other studies including ours). The reason for this discrepancy appears to be either inappropriate fossil priors (i.e. the minimum constraint of 33.9 Mya for the most recent common ancestor [MRCA] of extant Rhinolophus species) and/or that the priors were applied to the incorrect node (i.e. the MRCA of extant Rhi-

769

Please cite this article in press as: Dool, S.E., et al. Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera). Mol. Phylogenet. Evol. (2016), http://dx.doi.org/10.1016/j.ympev.2016.01.003

725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767

770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786

YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 12

S.E. Dool et al. / Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx

847

nolophus instead of the MRCA of the Rhinolophidae and Hipposideridae) in Stoffberg et al. (2010). The biogeography of the family can only be speculated upon due to the unresolved basal nodes in the phylogeny. Resolution of basal lineages is of greater importance than recent nodes in such reconstructions as they often greatly influence the outcome of the reconstruction (e.g. Ruedi et al., 2012). Added to this problem is the fact that current species distributions may not reflect the species distribution in the past. This may especially be the case for old lineages such as R. hipposideros. Guillén-Servent et al. (2003) put forward an intriguing hypothesis for a European origin for the family (contingent on a basal R. hipposideros). Nevertheless, a phylogeographic study on this species by Dool et al. (2013) showed that this species likely has its origins in the Middle-East and its past distribution may have been even further to the east. Hence coding this species as from Europe (where the species was most likely absent during the timeframe of interest) might lead to an inaccurate reconstruction of the biogeographic origin of the MRCA of extant Rhinolophus species. Further work is therefore needed to obtain more reliable reconstructions of the biogeographic origin of the family and its subsequent radiation. Many taxonomic groups in Africa have similar temporal patterns of diversification to those found in the current study (i.e. during the late Pliocene and Pleistocene; Order Primates, Guschanski et al., 2013; Sithaldeen et al., 2009; Clade Ungulata, Lorenzen et al., 2012; Order Rodentia, Taylor et al., 2014; Order Squamata, Tolley et al., 2008; and diverse succellent karoo floristic groups, Verboom et al., 2009). Major events in human evolution also occurred during this time including the appearance of the earliest members of the genus Homo (2.6 Mya) and the first migration out of Africa (1.8 Mya). Trauth et al. (2009) showed that in the Southern Hemisphere there were peaks in the variability of climatic conditions during the last 3 Mya putatively driven by orbitallyinduced insolation changes influencing monsoon dynamics. The effects of these climatic changes in East Africa may have been enhanced by the formation of amplifier lakes in the East African rift system from 3 Mya, which may have formed natural barriers to many taxa (Trauth et al., 2010). The sudden transition between periods of stable climate (perhaps facilitating population expansions), to one of high variability, may have placed evolutionary pressure on taxa to adapt and diversify (Donges et al., 2011) and/ or generated fragmented distributions, increasing the likelihood of vicariant speciation (Turelli et al., 2001). Thus our data seem consistent with climate change being the driver of faunal diversification in this region during the Pleistocene although such a hypothesis should be tested more rigorously and was beyond the scope of our study. Whilst persistence of Rhinolophus lineages in Africa has been robust to climatic changes over the past millennia, the pace of current and expected climatic changes is expected to be too rapid for many taxa to cope with, even without considering levels of habitat availability or fragmentation or species-specific plasticity (Loarie et al., 2009). In the tropical and subtropical grasslands and savannah biome which covers much of sub-Saharan Africa, the velocity of such changes is expected to be on average 0.67 km/yr. Designation of substantial protected areas based on biodiversity patterns may mitigate against the impact of climate change to some degree, a further justification for urgent taxonomic resolution (Brito et al., 2014). However, the genetic diversity occurring in an area (e.g. Razgour et al., 2013) should also be considered when designating such areas (see Tolley et al., 2009).

848

4.3. Ancestral state reconstruction

849

The reconstructed evolution of echolocation peak frequency throughout the genus Rhinolophus and within the Afro-Palearctic

787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846

850

clade in particular illustrates that from an assumed mid frequency of 83.4 kHz for the ancestor, most lineages did not change markedly from this value or tended to use higher frequencies. The most notable exception to this is within the R. fumigatus/R. hildebrandtii lineages where a shift to considerably lower frequencies occurred. The estimated forearm length of the most recent common ancestor (MRCA; 47.7 mm) corresponds well to the size observed in many ‘medium-size’ Rhinolophus species (Csorba et al., 2003). Our measure of phylogenetic signal (e.g. Blomberg’s K) was less than unity (i.e. 1) and this could be caused by measurement errors or deviation from Brownian motion. However, our approach of including within-species variation in the trait values and also of using a large number of possible trees (each having its own branch length and topology), means that a deviation from a Brownian motion model is more likely, possibly caused by selection acting on the trait. Our results are consistent with recent studies showing that peak frequency can be subject to strong environmental and sexual selection pressures in Rhinolophus species (e.g. Odendaal et al., 2014; S. Puechmaille et al., 2014). More studies are critically needed to better understand the interplay between ecological and sexual selection and its consequences for the evolution of peak frequency (S. Puechmaille et al., 2014). Due to its dual role in orientation and communication, echolocation has been suggested to play an important role in speciation (Jones, 1997; Jones and Siemers, 2011; S. Puechmaille et al., 2014, 2011; Schuchmann et al., 2012) and investigating in greater detail the evolution of peak frequency through time could help to clarify the potential role of echolocation in driving speciation (S. Puechmaille et al., 2014, 2011).

851

4.4. Should we continue using mtDNA in phylogenies?

879

Due to its availability in large quantities (>100 copies per cell), high mutation rate and easy amplification (many ‘universal’ primers are available), mitochondrial DNA has been widely used to infer phylogenies. Nevertheless, many studies have reported incongruence between trees produced from mtDNA versus nuclear datasets (e.g. Springer et al., 2001). The mechanisms responsible for the incongruence are not yet fully understood but homoplasy, selection, between species’ variation in mutation rates, incomplete lineage sorting and introgression have been regularly put forward to explain the relatively poor performance of mtDNA compared to nuclear datasets (e.g. Ballard and Rand, 2005; Ballard and Whitlock, 2004; Dávalos et al., 2012; Galtier et al., 2006, 2009; Nabholz et al., 2009; Ross, 2014). Our aim was not specifically to test which mechanism is acting on our dataset but to investigate the performance of mtDNA on an empirical dataset from a diverse group of relatively recently diverged taxa. In multiple cases (e.g. R. clivosus, R. blasii), the mtDNA recovered highly supported geographic clades within species which were not supported by nuclear introns (e.g. Fig. 4). Such apparent conflict is best explained by female philopatry associated with male biased dispersal, both being very common across mammals. Additionally, the higher resolution capacity of mtDNA (given its higher number of PI sites) and/or incomplete lineage sorting in the more slowly evolving introns (especially if past population sizes were large) could also have impacted upon the contrasting results found in the mtDNA versus intronic dataset. In the case of female philopatry and male-biased dispersal, mtDNA, only inherited through females, would show strong geographic structure while nuclear DNA, inherited via both sexes would show weaker or no structure because gene flow is male mediated (e.g. Boston et al., 2014; Dool et al., 2013; Razgour et al., 2013). For this reason, recently diverged mtDNA clades (even when divergence is large; e.g. see Andriollo et al., 2015) not supported by nuclear data have to be interpreted with extreme caution with respect to taxonomy. However, for phylogeographic studies such intra-specific clades often provide

880

Please cite this article in press as: Dool, S.E., et al. Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera). Mol. Phylogenet. Evol. (2016), http://dx.doi.org/10.1016/j.ympev.2016.01.003

852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878

881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914

YMPEV 5397

No. of Pages 17, Model 5G

29 January 2016 S.E. Dool et al. / Molecular Phylogenetics and Evolution xxx (2016) xxx–xxx 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980

insights into the recent history of the species (e.g. Boston et al., 2014; Dool et al., 2013). Another common caveat with mtDNA lies in the fact that in cases of past or present introgression between species, the same mtDNA can be shared between species, resulting in a gene tree rather than a species tree (Nichols, 2001). Cases of introgression have already been documented in bats (e.g. Berthier et al., 2006; Mao et al., 2013; Nesi et al., 2011) and here we provide yet another example of mtDNA introgression, most likely from R. ferrumequinum into the Algerian population of R. clivosus. Unless uncovered, cases of introgression inevitably lead to the wrong phylogeny and wrong taxonomic inferences. Evidence of mtDNA introgression is important to consider in light of recent phylogenetic studies focused on these sister taxa based on mtDNA only (Benda and Vallo, 2012). As in many situations, a more complete geographic sampling combined with the use of nuclear loci would be critical to untangle potential cryptic species within these sister taxa (see Supplementary Text, section 3), especially where they exist in sympatry or close parapatry. Although not detailed here, incongruence between mtDNA and nuclear DNA can result from the inadvertent inclusion in the dataset of Numts (Bensasson et al., 2001; Thalmann et al., 2004). Although we amplified a large fragment for many individuals per species and did not find any stop codon in the CDS, we cannot rule out the presence of Numts, which have already been identified in bats (Puechmaille et al., 2011), including Rhinolophus species (Dool et al., 2013; Mao et al., 2014). As the mitochondrial genome is inherited as a single unit, amplifying more than one mtDNA fragment for phylogenetic reconstruction does not yield independent information (as would for example two unlinked nuclear loci), and thus provides limited benefits beyond increasing resolution of the gene tree and detectability of Numts (by detecting significant incongruence between different mtDNA gene trees). As most research projects are limited by time and financial constraints, it would make more sense to invest in sequencing further nuclear loci instead of multiple mtDNA fragments if recovery of the species tree is desired (Nichols, 2001). Notwithstanding the above critique, when used alongside nuclear DNA (e.g. the current study; Berthier et al., 2006; Mao et al., 2013), mtDNA is still a very useful molecular marker in phylogenetic studies for revealing the presence of introgression. Given that mtDNA is easier to amplify than nuclear DNA (cf. introduction), it is faster and cheaper to amplify one mtDNA fragment compared to multiple nuclear fragments. Hence in certain cases, for example in groups with well resolved taxonomy but lacking a phylogeny, a mtDNA phylogeny will give a preliminary (i.e. it will likely be incorrect for some nodes) but global overview of the phylogenetic relationships quickly and cheaply. The use of mtDNA also benefits from a large number of sequences already available on online databases (e.g. GenBank, BOLD) while datasets for nuclear markers are much smaller in terms of coverage, both within and between species. If a decision is made to use a mtDNA fragment, our results with Rhinolophus species suggest that it could be better to use the control region rather than cytochrome b (especially when considering the first part of cytb). However, unless already tested, a pilot study should be carried out to assess the suitability of the control region for the taxonomic group of interest, especially as some mammals harbour repeated sequence arrays that would hinder the usefulness of this marker (Wilkinson et al., 1997). When the nuclear marker route is favoured, it is important to note that different nuclear fragments might carry different levels of information and the selection of the markers based on already published closely related species or via pilot studies is greatly advised. Although commonly used, we would, for example, advise against the use of exons that are typically used to resolve interfamily relationships (or older events; e.g. RAG exons) for studies

13

interested in shallow timeframes (e.g.

Nuclear introns outperform mitochondrial DNA in inter-specific phylogenetic reconstruction: Lessons from horseshoe bats (Rhinolophidae: Chiroptera).

Despite many studies illustrating the perils of utilising mitochondrial DNA in phylogenetic studies, it remains one of the most widely used genetic ma...
2MB Sizes 0 Downloads 13 Views