MOLECULAR

PHYLUGENETICB

AND

Vol. 1, No. 1, March, pp. 41-52,

EVOLUTION

1992

Evolution of the Mitochondrial Cytochrome Oxidase II Gene among 10 Orders of Insects HONG Institute

of Molecular

LIU AND ANDREW T. BECKENBACH

Biology and Biochemistry, Department of Biological Sciences, Simon Fraser University, British Columbia, Canada VSA IS6 Received

September

23, 1991;

We examine the complete nucleotide sequences of the mitochondrial cytochrome oxidase II gene of 13 species of insects, representing 10 orders. The genes range from 673 to 690 bp in length, encoding 226 to 229 amino acids. Several insertion or deletion events, each involving one or two codons, can be observed. The 3’ end of the gene is extremely variable in both length and sequence, making alignment of the ends unreliable. Using the flrst 639 nucleotide positions, for which unambiguous alignments could be obtained, we examine the neighbor-joining trees based on nucleotide divergences and based on conserved subsets of that data, including transversion and amino acid and second codon position divergences. Each of these subsets produces different trees, none of which can be easily reconciled with trees constructed using morphology and the fossil record. Bootstrap analysis using second codon positions strongly supports affinities between the order Blatteria (cockroaches) and the order Isoptera (termites) and between a wasp and the published honeybee sequence (Order Hymenoptera). The divergence of insect orders is very ancient and may have occurred too rapidly for easy resolution using mitochondrial protein sequences. Unambiguous resolution of insect orders will probably require analysis of many additional taxa, using the CO11 gene and other conserved sequences. o 1982 Academic

Press,

Inc.

INTRODUCTION

Insects constitute by far the greatest proportion of described animal species. Fossil evidence concerning their origins is scanty, but offers some insight into the radiation of the major groups. A number of authors have combined cladistic analysis of morphological and developmental characters with the available fossil record to determine a likely sequence of events (Wooten, 1981; Hennig, 1981; Boudreaux, 1979; Kristensen, Sequence data from this article have been deposited with the GenBank Data Libraries under Accession Nos. M83959-M83971.

revised

January

Burnaby,

8, 1992

1981). The first insect fossils appear in the upper Carboniferous, about 325 million years (Myr) ago (Wooten, 1981). By that time they had already undergone substantial radiation. It is generally believed that the Paleoptera, represented by order Odonata (dragonflies) in this study, had separated from the Neoptera (which includes the other nine orders examined here) by the end of the Ordovician, more than 350 Myr ago (Hennig, 1981). The most common insect fossils of the Carboniferous belong to the Orthopteroid groups, including the Blatteria (cockroaches) and the Orthoptera (crickets and grasshoppers) (Wooten, 1981). Fossils assignable to the Hemiptera (true bugs) and Coleoptera (beetles) appear during the Permian (Wooten, 1981). The Diptera (true flies) and Hymenoptera (wasps, ants, and bees) do not appear until the Triassic (Hennig, 1981). For the Isoptera (termites), Siphonaptera (fleas), and Lepidoptera (butterflies and moths), the fossil record provides few clues concerning their origins. Perhaps the firmest conclusions that can be reached are that the radiation of extant insect orders occurred over a span of about 70 Myr and was essentially complete by the start of the Mesozoic, about 280 Myr ago (Hennig, 1981). Cladistic analysis suggests that the Holometabolous insects (those undergoing complete metamorphosis, including among others, Coleoptera, Hymenoptera, Lepidoptera, Diptera, and Siphonaptera) form a monophyletic group (Wooten, 1981; Hennig, 1981; Boudreaux, 1979; Kristensen, 1981). This belief is based primarily on the assumption that the insertion of a pupal stage between the larval and the adult stages could not have arisen more than once (Hennig, 1981). In the absence of a definitive fossil record, molecular data may provide valuable insight into the origins of these groups. It is clear that the phylogenies are quite deep and that it will be necessary to examine highly conserved sequences or sites, or perhaps gene order in the mitochondrial genome (Moritz et al., 1987). To determine whether sequence comparisons of a conserved mitochondrial gene, cytochrome oxidase II (COII), could clarify some of these relationships, we

41 1055-7903/92 $5.00 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.

LIU AND BECKENBACH

42

TABLE

have sequenced the gene from representative members of 10 orders of insects, using PCR amplification of the complete gene and direct sequencing.

1

List of Insect Species Studied in This Paper Name

Order

MATERIALS

AND METHODS

DNA Sources

A single individual of each species, either from stocks maintained at the Simon Fraser University insectary or freshly caught, was used as source material for DNA extraction. Species names and the orders to which they belong are shown in Table 1. DNA Extraction

A small amount of tissue from each insect was ground in 60 ~1 protease buffer (0.1 M Tris-OH, pH 8.0, 0.05 M EDTA, 0.2 M NaCl, 1% SDS, with 0.4 mg/ml protease K) and immediately placed at 65°C for 3 min. The resulting solution was extracted once with phenol (saturated with 10 mM Tris, 1 mM EDTA, pH 8.0) and once with chloroform/isoamyl alcohol (24/l). The supernatant was removed and 2 vol of 95% ethanol was added to precipitate the DNA. The suspension was pelleted at 40K rpm for 15 min at 4”C, washed twice with 70% ethanol, dried under vacuum, and dissolved in 50 ~1 sterile ddHzO. Amplification

and Sequencing

The CO11 gene was amplified using the polymerase chain reaction (PCR) performed in 50-pl reactions using the GeneAmp kit with AmpliTaq DNA polymerase (Perkin-Elmer-Cetus). We followed the protocol provided with the kit. Thermal cycling was done in an Ericomp programmable cyclic reactor. The PCR consisted of 35 cycles performed as follows: denaturation at 94°C for 1 min, annealing at 50°C for 1 min, and extension at 70°C for 2 min. The primers for PCR amplification and sequencing are given in Table 2. Primers labeled A are oriented 5’ + 3’ in the direction of transcription of the CO11 gene. Primers labeled B are oriented in the opposite direction. The CO11 gene in

Blattaria Coleoptera Coleoptera Diptera Hemiptera Hymenoptera Hymenoptera Isoptera Lepidoptera Odonata Orthoptera Orthoptera Siphonaptera

is flanked by tRNAh, and tRNA,, (Clary and Wolstenholme, 19851, and the end primers, AtLEU and B-tLYS, were based on the published sequences. The numbers refer to the position of the 5’ nucleotide in the Drosophila yakuba CO11 sequence. The internal primers, used for sequencing only, were chosen in regions of sequence conservation between D. yakuba and other insects and were based on partial sequences as they were obtained. In the ant, the BtLYS primer amplified from a second site within the CO11 gene, yielding a truncated sequence. This problem was circumvented by using the B-ATP8 primer (Table 2) in place of BtLYS for amplification in this species. The double-stranded DNA was purified by electrophoresis in low-melting agarose gel and followed by concentration using Amicon 30 microconcentrators. About 7 ~1 was used for double-stranded sequencing using the Sequenase kit (United States Biochemical Corp.). We used modifications of the standard Sequenase protocol given by Palumbi et al. (The Simple Fool’s Guide to PCR, compiled by S. Palumbi, A. MarDrosophila

TABLE Primers

Used in Amplification

Name of primer A-tLEU B-tLYS A-171 A-226 A-298 B-434 A-470 A-592 B-ATP8

2

and Sequencing Sequence

5’-ATGGCAGATTAGTGCAATGG-3’ 5’-G’I’TTAAGAGACCAGTACTTG-3’ 5’-GGTCAAACAATI’GAGTCTAT’I’TGAAC-3’ 5’-A’M’GCATTACCATCTCTACGATT-3’ 5’-A’ITGGACATCAATGATATTGA-3’ 5’-GGTAAAACTACTCGA’ITATCAAC-3’ 5’-AGATGTTC’ITCACTCATGA-3’ 5’-GAAA’MTGTGGAGCAAATCATAG-3’ 5’-TCATCTTATAGGTACTAT’M’GAGG-3’

a The position of each primer refers to the complete D. yakubu

Cockroach (Peripplanetu americana) Ladybird beetle (Ad&a bipunctutu) Grain Weevil (Sitophilus grunarius) Fruit fly (Drosophila pseudoobscuru) Milkweed bug (Oncopeltus fusciatus) Wasp (Excristes roborator) Ant (Lusius sp.) Termite (Zootermopsis angusticollis) Moth (Gulleriu mellonella) Dragonfly (Sympetrum striolutum) Cricket (Ache& domesticus) Locust (Schistocercu gregaria) Flea (Ctenocephulides felis)

sequence.

of Insect CO11 mtDNA Positions spanned0 3018-3038 3804-3784 3255-3281 3307-3330 3379-3400 3516-3493 3553-3572 3673-3696 3937-3913

CO11 EVOLUTION

tin, S. Romano, bowski).

W.O. McMillan,

L. Stice, and G. Gra-

Data Analysis

Sequences were aligned and compared with the aid of eyeball sequence editor, ESEE (Cabot and Beckenbath, 1989). The alignment was done using translated sequences applied to the nucleotide sequences. Translations to amino acid sequences used the Drosophila mitochondrial code (de Bruijn, 1983). We performed our analyses using distance matrices based on the complete nucleotide sequences, the inferred translated sequences, transversion differences, and second codon positions. Distances were corrected using the JukesCantor method (Jukes and Cantor, 1969). Kimura two-parameter corrections (Kimura, 1980) were also calculated for comparisons involving the complete nucleotide sequences and the second codon positions. In the absence of independent estimates of transition and transversion mutation rates, the two-parameter method uses the observed transition/transversion ratio. In the comparisons presented here, observed transversions exceed the transitions, and this method fails to adequately compensate. Only the Jukes-Cantor corrections are reported here, although they probably represent underestimates of the actual sequence divergences. The distance trees were constructed by the neighbor-joining method (Saitou and Nei, 1987), using the NEIGHBOR program, version 3.4 of PHYLIP (Felsenstein, 1989). Bootstrap analyses were performed on the second codon position data using both parsimony and distance methods. Parsimony bootstrapping was done using the DNABOOT program of PHYLIP, while the distance matrix bootstrapping was carried out using SEQBOOT, DNADIST, NEIGHBOR, and CONSENSE of the PHYLIP package (Felsenstein, 1989). RESULTS Nucleotide

Sequences

Figure 1 shows the aligned nucleotide sequences of the cytochrome oxidase II genes (COII) from 13 species, representing 10 orders of insects. The genes range from 673 bp in the wasp to 690 bp in the milkweed bug, and encode 226-229 amino acids. Most of the sequences have ATG for initiation, although ATA occurs at the beginning of the ant and milkweed bug sequences, and ATT appears to serve as initiator in both beetle sequences. In six of the sequences, only the T of the stop codon is coded in the DNA sequence, while seven of the species have TAA at the S’end, potentially encoding the entire terminator. Most of the variation in length of the CO11 gene occurs at or near the 3’ end of the gene. Several internal insertion/deletion events can also be seen. In the ant sequence, a deletion of three nucleotides appears at

43

IN INSECTS

about positions 16-19. In both the wasp and the cricket sequences, a deletion of two codons occurs at approximately positions 390-395 in the nucleotide sequences. The regions corresponding to these deletions are subject to considerable variation at the amino acid level, so the alignments of those deletions cannot be made with absolute certainty. In particular, it is clear that the deletions in the wasp and cricket are independent events and may not involve identical codon positions. This two-codon deletion is also present in the honeybee sequence (Crazier et al., 19891, but is absent from the other Hymenopteran sequenced here, the ant. An insertion of three nucleotides occurs in the cricket sequence, at positions 349-351. This insertion is not shared by the locust, a member of the same order, nor by any other of the sequences examined here. Codon Usage

Codon usage totaled for the CO11 gene in all 13 taxa is given in Table 3. Codon usage in the insects reflects the bias against codons ending in G, as is typical of other mitochondrial protein coding sequences (Brown, 1985). Several codons that are rarely, if ever, used in the sequences obtained from other organisms, appear as apparent sense codons in these insects. AGG, which is evidently a terminator in some mammals (Brown, 1985) and which does not occur in the Drosophila yaltuba sequence (Clary and Wolstenholme, 1985), appears in these sequences. Codon position 205 (nucleotide positions 613-615) is a conserved serine in the other 12 taxa and is coded by AGG in the weevil sequence. Two other rare codons, CAG (glutamic acid) and CGC (arginine), occur occasionally in the insect CO11 gene. Inferred

Amino

Acid Sequences

Figure 2 shows the inferred amino acid sequences for the species studied here. Translations used the Drosophila mitochondrial code (de Bruijn, 1983), including the codon assignments noted above. Two regions are highly conserved at the amino acid level. The region from amino acid positions 101 to 113 has been shown to be involved in the electron transfer system (Millett et al., 1983). The second region, from positions 193 to 203 in the amino acid sequences, includes most of the copper binding site of the protein. The conserved histidine at position 161 also forms part of the copper binding site (Millett et al., 1983). Other individual residues known to be conserved in CO11 sequences from vertebrates, yeast, and maize are also conserved in all the insects studied here. They include histidine (position 24), glutamic acid (62 and 2121, and aspartic acid (88,139,158, and 173). Several positions known to be conserved among bovine, yeast, and maize CO11 sequences show variation in some of the insect groups. Codon 11, which generally codes for aspartic acid, appears to code for asparagine in four of

LIU AND BECKENBA~H

44 Drosophi

La

Moth Flea Weevi 1 Lsdy uasp

Beetle

Ant Milkweed

Bug

Cricket Locust Cockroach

AGGTTTACM

GATAGAGCCT CTCCATTMT

GGAACMTTA

. ..CGA..T. . ..AA*..T.

..T ....... ..ATA .....

TM ....... TAA .......

A....T

..TC.T .. .A..T .....

A......A A .........

..T .......

..AAA..CC. CTT ............... ..AAATC.A G GTTA..TTT.

..C...T.A.

.A...C .... ..TTTC..T.

..T...TM. ..TTA---C.

.M.C.T .... TTCAC.T ...

..GCCMT. A..TC.MT.

A .. ..TTC .T ..A.GC..C. A ......... CGA ......... ..... ..C ...... A......C

.A..TA .... ... ..ACCT A T..TAT.A

..AG.A ....

..ATA...A.

TAA.C.T ...

..CGC.MT.

.AT..AC

. ..G.C .... . ..G.A ....

..T.C

.M .......

A.....T.A.

.. ..TC ....

.. . ..... T. . ..AG ..... ..AAA...T.

. ..A.A ...... . ..A.A .... . ..G.A .......

fermi te Draoonf

ATGTCTACAT GAGCTMTTT

iy

.....

......

A.....M

..

T.

..T.A ..... ..C...A.

.TC.A.T .... .M .............

..G....T.

..T.ATG.A. ..C.AC.

.M.C ..... .AAC..T ....

..C.....A. ..GCG..A.

...

.A ......... .C..TA.T

..A.

ATTTTTTTTC .T .............. ..A .......

A.....GC A......C

..

.G...A.T .. .C...A ....

ATGACCACGC ATTATTAATT T..TA. T ... A.T ... ..A.T..TT. .....

.A ........

..T .. TA..C .. ..C ..T..T ............ ..T..TA.

C...A

..............

..TTT.A.

.T .CAA.A ...... .C ..... C ...........

..T...A.

.. ..C TA..A ..... .. ..A ..... 1 . ..A.. ...

TA.

..G ...... TCA..C ....... A ........... ..AC..C. A .....................

.....

A......C

.. ..T..TA.

.A.GA..G

TGGACMTTA

.C CAC.A...C.

..G.TC A TACTA.GC.C ..T .. C...A .. ..C ..T...A. CC..A ..... ..

Drosophila

TTAGTAATM

TTCTTTTACA

Moth Flea

. ..A.T ...... A.TT..T ......

..T..C.TG T ..... ..A .. A.T.AC.TG. M.....M T

Ueevi 1 Lafiy Beetle

. .

A...CC.CG. ..T.A...G.

A..T ...... A..T..TAC.

Uasp

.

..A..A

....

AMT...AC

Ant Uitkweed Cricket

. . ..ACT ....... C.-C ........

..T.AC

....

A.....T

Bug

TTACAGTATT AGTTCGATAT TTAATATTTA TATTATTTTT TAATAGTTAT GTAAATCGAT ..M.T. A.T ....... ..TA.T ..... A ...... A.T...A ... AC ........... ..A.T ... ..G ...... A.T...AGA T CT.....A .A ... ..AA.T A TAT.....T. ..A.CT ..... ..TA.CA. .A A..C.G..A. 0.A.X ....... ..MCT. TCTC....T. . ..M..C ..MT ........ ..GTG C T..A.CT ... A.T...A ... GT...C ....... ..M ..C MTC.C .... ..T..C ..... ..TA.TA C TA..ATT ... A.T..T.C .T C.A.TJ..A. A .... A..T A ACT...M .. TA..CT..T. A.T ....... CTA..A ..M ..A.CT.T. .. ..TC.TC. ... ..A..T A A.T .......

%I

G..T..T

..TC.A. ..T..T

.........

T.4ACT.T. ..C .... ..... ..A. T..A..T ...

Locust Cockroach Termite

A.TA.T .... G.TAC .....

Dragonfly

C.G.....T.

Drosophila

ATTGAAATM TTTGAACMT ..TG. .... ..T ........

Moth Flee

..TT.A..G.

T..ATC

..CT ... ..C .. ..TA.CA.

G.......T.

ueevi L Lsdy Beetle

.......... ... ..G.C ............ .................... G .... ..CT.

Wasp

......

Ant Uitkuaed

.... ....

Bug

Cricket Locust Cockroach

..T.

..........

..T ............. ..T.T. .....

...... .....

..T. ..C ..

....

..T ..G

.... . ..ATT .... . ..A.CT ...

U.T..T

..T

.A.C..TAT. .ACT..TAT. ..T .C M.T..A ..T

..G

..T...ACA A.T...GGA.

......... T . ..C.T .... ....

MTCAACCTT

CAATTACACT AAAMGAATT

. ..A.T . ..C . ..AGG .... . ..A.T . ..C

TT.....CT. T....T.TT. A.C.ACT.A.

Lady uesp

CGGA.T..C. . ..A.C..C.

T.G.A..CT. T...A...A.

TT.A.TM.A

A......TA.

............. ..C .G M.....G

.T ............ AC.T ......

.T.A...A

..

. ..A.C . ..G A...A..C

..

A..C...AC. ....... M.

.C...A.CT.

.........

..A

T.A ....

T.AAA .. ..A

..C

.T......A.

T .........

C..C..C A..T..C

... ..A

..AAA.A.C. T..M.A.C.

........

CC....A AT....C

... ..A

..A.ATC.T. T.A ..........

CC.T..A

..A

T.A...C

.....

AG....A ... ..CC ... ..C

..A...C

...........

T.A...C

..C

..C..G

. ..C

...... ......

. T ...... ..T ..A C .... ..G ..A ..A. T ...... ..T ..C ......... ..TC T

..C ..... A.

.. ..CC.TA.

T.....TTC ......... ..........

.. ..C .......... ATAACGTA--

..G .A -GAATTTGAT

.A..TA.TTA..TA.T.A..TA.T-

- ... T . ..A .. ...........

.A..TA.CA

T C .........

TAG.....-

........

.....

Locust Cockroech

GTA..TG .M

T.......A.

T....CC

. ..A C .... . C.CA.C ..M

..G.A...T. .T..A...A.

... ..CT ... C....CC ..C

..A.GA ...... ..A ................. ..A..C ......

.*..*m...*

..I)*.**

. . . ..C....

..G..C....

C .... ...

..A .............

..T C A ......... ..C .....

AT ........

A ............

A ............... ..C ..................

..C..C.

..TA.T-

..C G CA..G.....C..C. T...A...-

. . . . . . . . . . . . . . . . . . . . . . . . . . ..C.

CO11 gene of 13 species of insects. Dots indicate

..TA.T-

A .

- ....... ...........

t...C...A.

identity

270

T

.A..TA.T.A..Af..-

. ..C

FIG. 1. Sequences of the mitochondrial Dashes indicate deletion/insertion events.

A.A.AC..G. A.......G.

CTTCGATTAT TATATTTACT AGATWMTC T. ..... ..T ............ ..T .A .............. ..TTC A ..C.T.

GGTCATCAAT GATATTGAAG TTATGAATAT TCAGATTTTA

.TA..C

CT.

A.T.AC.TG. A..T...AC T ..T.AC.CG. A......AC T A A.A.AC.T ...... ..T ... ..T.A..GG. A..T...A ..

AT....C

..C ...

C ..C

... ..CT ... ..A ................................ C ...... .. ..GC .................................. A.T ......... T....TT ... ..A ............. ..T C ......... . ..T....A .. ... ..CT ... ..G ......... ..C .. ..C .................... T...TCC ....................... A ...... ..C ..T ....... T...TCTG .G ..C..C ...... ..C ..... C..C ...... ..T .... ..T T...GTT ... ..A ................. ..T.....C. A .........

Cricket

ly

AT ...........

C .... ..A ..... ..CAT .M .T ........ .GCT..A ... G.A...T .M .T ........ .GC...A ..T G.A..C..A G ..... ..CA. .. .G....A.T. T..G..T .GG ..A ...........

Moth Flea !deevi L

te

TATACCTTCT

.T ........ .T ........

.C......G CA ........

Drosophila

Ted

TATTTATTGC

. .T..A ..... .AA .T ........

..T .M

ACT ....... ACT..C..T

A.T ....... . ..C.AGA .C .AG..C .... ..AGG .. ..T.....G.

T C.A..ACCA. C.A ..........

......

..C...T

Drefjanf

C.A..G

. ..C .. C...GCC ... ......... M ...

.C.....T

.....

Bug

A.....A

A A.T .......

C...CT

..T .GA .T..A ..... .. ..CAT .M .C ........ ..C ....... CAT.M .T..C .....

Termite

Ant Wilkueed

T C.A..ACAA C A....CA.T

A.T....CC. .G ........ GCCC..AG .T ATA....A

TTTACCGGCA ATTATTCTTT .C....A ..... ..C . ..M .C....T ... T....AT .M

Dragonfly

Beetle

A.T...G.A

‘180

.AC..A.T--

........

357

..C

. -TCT ......

. -.....C

..C

... ..C

........... . -.....C

..C

-.......a.

to the D. pseudoobscum

sequence

CO11 EVOLUTION Drosophila

TCTTATATAA

TCCCAACAAA TGAATTATCA MTGATCGAT

Moth Flea

.......... ..A .......

.TG...GT .T...T.T

UeeviL

..A .......

.T..T

Lady

Beetle

UOSP

......... ..A.T .....

Ant Milkweed

..A.T ..... ..A .......

Bug

Cricket Locust

.T..TT .... M ........

.........

Drosophila Moth

ATTCGAATTT

Flea UeeviI

... ..T .... ........ C.

Bug

..MG

TAGTMCAGC

...

..A .....

.CC....A .... ..... ..C ....

..A.AG .. ..A.ATT T

... ..C..ATT T .. .C.M .....

..

..A.T..T .. ..... ..TT.

TA.C ...... . ..A ... ..A

C.T..T ...... ..T.....C.

..CA.T .. ... ..A ....

.T..C.T ... A....M.T. MGAA..A .T A....M.T.

.T.CT..T ...........

......

... ..TT.A. ..AC.T.TT. ..C..T..A C ..A.T.ATT.

T..A ... ..G . ..A ... ..A

..T..T .... ... ..T..T.

.G..TA .... .. ..T..A ..

.T.CCT.T ... ..C ...... ..TC.GT ........ ..A .T

..........

..ACT...T.

CATA ......

TT...C . ..G AT.A ... ..T AT.C ... ..G

A....M

..A.T ...... ..........

..A..C T ......

AT..C ..... . ..CC .. ..I

..A..M.T. ..G..MCT.

..AC..G

.T.A ......

........ .....

C. ..CA.

MTCAAACTA ............ ..C...T.A.

ueevil Ledy Beetle

.... .....

Uasp

.........

Ant

....

..G.A. ..TA.

...

.C ......... .C ........ ..A.T

..A..C C ......

.. ..T..T

C....T .... ..T..T..C.

..G..A .... .. ..C..TT.

C.A .......

..G....T

... ..A

T.A .......... C.A ..........

..............

. C.T..T

..TC...T

...

.T.CT ..... .T.CC..C ..

..M.T.

.T.C...G

..

.AG.C.C ..t .AGA ......

..A..A.CT. A....A.CC.

.T.C...T .. . ..C ......

AT.A ......

A.......T.

.T.CT..T

...

A..T ...

..A....A

T .........................

..A.A ..

..A

..C.M..G C C..T...T GG...ACAC. . ..T..CA ... ...

..CA.A

..T ...

..

C.....T

..A

..C ..A ..A

Termite Dragonfly

......

GA .........

..T..T

..A

GTMTTGAM

GTGTTCCAGT MATTATTTT

..T .......

.M..T..A.

T..M

.M..TT.A. .M.CA.TCC

T....CA C..C.TA

.M..T.TC .MC.T.TM CMC.MTT.

..

T......G T...C

..C

C ......

..A.

......

..TATA.

..T..G

..C

.................. C ......

.. ..T ...... .. ..T .....

..C..G C ......

..C ..C

..A ... ..A ..C

......

..A.

.. ..C..A

......

..A ..C

..M.T..C.

T...T

C .........

..C..G

.........

..A ......

..A ..C

AGTMATTCT

T

ATTAAATGAA TTTCTMTAG ..T ....

..M...CT

. ..G .

A TTCTT.C.M

..ATC ..A ..AT.TC ..A ..M...AT

TTC.T.A ..CTT .A C CTC.TTA.M

.A..AAATC.

. ..M

...

..C ....

..AM...T

A CTC.TCA.

..G . ..T ..C ....

.A.....G .CCTA..A

.T .A.TT .T .MT.TA.M

..AM..CC

T .AGCG.A.

.A..AT..A .A...MTA.

C . ..M.A C..A.CA

.... ......

............

..... ..A ...... ..A ... .. ..C..A .. C .........

..AG.TC.A. ..AC..C.C.

C .T...G.AG

..........

..A ............

............

..MC..AT

A .........

.....

...... ......

T.A .......

Dragonfly

TATACCTATT

.. ..C..C

T..M.C

Termite

..G

.C..C..A .. ..C..A

..C...A.

.MC.T ....... ..TA ...... ..A..T..A C .. ..GGC ......

..G ...

...

.AAMTC.TT .AAMTCATT

.......... ..........

..

... ..T..A. ...... ..A.

..TCC . ..T .M..T . ..C

Locust

...

..C

C..CA ...... TTCA ...... .. ..C

A..T..G A..T..CC A .........

..C

.....

..G ...... ..T ....

.CCAA .. ..T

FIG. l-Continued

.T TT..T

.A

.AG.G.AG.A

C .M

.

..T..A.

..G ...... A.....TC .T A..T ......

C ......

C TTCAA ..............

...

..C..GA.A. ..A.

..T ... ..T..TC ..

..... ..A ............ .. ..C..G .. C ......... .. ..C..A

537

...

T .........

............... ..C T.A..T

Cricket

..A. ..T..A.

A..T..T

..A.....C.

........

..C

...

...............

..M.T..A. T...C..G ..A .................

A..T ......

... ..T ...... ...... ..C ..C

..T ....... ..C .... ..T

..M.T.A ..A....AC.

GA.....A GA...A.A

TCT .......

..

.. A.M.C..C. T...T ..... .A ..M.T ..............................

AT ........

CA..CACA ..

..C ....

..

...

...

...

....

..T .........

..C..A

..G

... ..M.T. ..C..M.T.

ATGGACAATG TTCAGAAATT TGTGGAGCTA ATCATAGTTT ..T ..................... ..A. ....................

GM..AM......

....

......

T..A ...

..A..T.

GGTTTATTTT ..M.T ......

...... GGA. ... ..G.T ....

.......... ..TT.A

..

..A .... ..A ....

...

TMCCGACCT C..T..T ..A

..T.C T TA..A.A

..A.

..

..G

ATTTTTTTAT ..C .....

. TAA.M.A

..GGA.

..T.....T. ... ..A

..C...GT .. .... ..T.A.

Cockroech

.A ..........

CA ..C .........

.T.C..AT

Moth FLea

Bug

..G ..T

ACGGAACACC TGGACGATTA

Drosophila

U-P Ant Ililkneed

..C....C

AT .....

C.

Flea

....

...

..M .... T....AC.C T

.. ..TA ....

G .......

UeeviL Lady geetie

..C

..C..C

....

..T.....T.

Dragonfly

Ia

..C ... ..M ..A.C..A ..TTG .A ..C.T...T.

. ..A..C ...

447

C CAA.T..CT A C..M.C ... M ..... ..AT A TTTA...A .T

TA.A ... ..A

. .

Droropbi Moth

. ..A ...... .... ..A ..C .. ..AC .......

..C....CA ..C....C

AGCTGATGTT ATCCACTCAT GAACAGTCCC CGCTTTAGGA GTTMGGTAG

......... .........

Locust Cockroach

.C..T..T

.

..A ......

..

G.A...G.A.

Bug

TCA------.

..C ...

CAC....A CA.....G

..T..T

Locust Cockroach Termite

Milkweed Cricket

...................

C CCATT.MT. .. ..T..TT. T C ........... ..TT..T. T CC.A...A .... ..TT..T. . ..A...A ........ ..T.

CC.....M C......AT.

.....

TTTTACCTAT AAATTCTCAA ..... ..C .... ..MC ... .C ........ T.....A ...

........................ ..A ... ..C....C AA T C..AG.AT .. .C ...................... ..GCA. CA...TATG. A...GATM C ..GA.MT ............. ..AACC ..C .. ..ACG .M .T..TTAC .... ..A..GA. G.GTCA..C C .A...T..T. . ..A ............. .

............

Milkwed Cricket

..TT..T.

.A.T...M M.T...M

A..CC..GA. .. ..A..GA

....

....

C TCCTT.MT.

CA...GA.C. .T..TCA.G.

..A..C ..........

TTCGACTACT AGATGTTGAT MTCGAATTG ..T . ..A .. ..T..TT. .... ..A ......

A...C.TAA

.A.....AG.

Termite Dragonfly

Ant

..C.TC .T TTM..MT. ..AA T TTM..TC

. CAG.CCT.G.

A ......... ..A .......

Beetle

.....

. .AT.CTATG. .TAAT.G ...

Cockroach

Lady Uasp

.... .......

45

IN INSECTS

627

46

LIU

AND

BECKENBACH

TABLE

3

Codon Usage in CO11 Genes of 13 Species of Insects Phe (F) Leu CL)

TIT TTC TTA Tl!G

139 31 220 4

Ser 6)

TCT TCC TCA TCG

62 17 96 1

% 07 TER

TAT TAC TAA TAG

92 21 7 0

CYS (0 2

TGT TGC TGA TGG

21 8 77 1

Leu (L)

ClT CTC CTA CTG

44 12 69 2

Pro (P)

CCT ccc CCA CCG

56 17 52 3

His (H) Gln

52 25 86 4

Arg (R)

(&I

CAT CAC CAA CAG

CGT CGC CGA CGG

22 4 59 4

Ile (I)

ATT ATC ATA ATG

297 62 148 14

Thr CT)

ACT ACC ACA ACG

62 19 95 3

Asn 00 LYS W)

AAT AAC AAA AAG

178 40 71 9

Ser (S)

AGT AGC AGA AGG

20 4 46 4

Val m

GTT GTC GTA GTG

50 7 65 4

Ala (A)

GCT GCC GCA GCG

31 18 59 1

ASP UN Glu (E)

GAT GAC GAA GAG

95 29 115 7

GUY (G)

GGT GGC GGA GGG

27 8 58 - 10

Drosophila Moth Flee Ycevi Ledy

MSTUANLGLP DSASPLNEPL IFFHDHALLI .R..S.FN .. N...... ..I .... ..T.I.

LVMITVLVGY .I...1 ....

1

.N..M.FN I . ..K..F

ILL..1 .... I.SS.LY.KL Y..Y..ES .N V.I....... .IL..IcI.S P MLLSML..K L SN.Y..E ... ..T....l..

Beetle

I . ..KSSLF L ..S.F.L

Uasp

.. N.N ....... ........ ..L.

.. ..WMN .N..L-.S

.. ..

N...N.W NC.....M

...

.AN..M ........

.. ..

R ......... ..T.M.

.N...GA.A.

I.IS.L..K.

NH....E.H

.LL..ITII.

IISSIIcI.NL

T.K.1IIPN.T

.IF..L.IL .T . ..SI.T .L......A.

.A..M.IN

..

N.N..TYD IIII ... ..FTIIW. .AN.ST .... TH....T.N.

Cricket

.A..S..N

..

N.S ...........

Locust Cockroach

.A..S..S .T....MN

I . .G ........ ..... ..I ....

Tcrmi te Dregonf ly

.T..SCMN ..... ..I ........ ..T.M. VT..L.S.L. .A...g.NF. .A...N .... HY....TNN V ..l..lN.A.

Drosophi

NEPSITLKSI

Ant Milkweed

Bug

La

Moth

.N.L....T. .S.L.S..A.

Flea Ueevi 1 Ledy

.N.GLLI.I.

Beetle

Uasp Ant Nilkweed Cricket

Bug

Locust

Dregonf Drosophi Moth

...... La

T....I-S

.....

..........

..YK.I-

.......

..T.

I.S.....L.

T.....E

T..NM...H. I.....E ..M

..T...A ..LA...A

F.TSMTI.KN I.GTN . ..KD

E..YM.E . ..Y..D

..L....A ..TE...V

..K ..I

LNN ........ LNS ........

..I....N. ..I..1

..LSYDN .N PFN....E

..

..TI..YL

..K.I-D ..L.I-

.N .F..NSKN .N H-- ....... ... .F...S.P .N PNE.....T. ..K ... ..I1 ..SALDK .. T..T.EGD

.NE....E S-- ....... .E ..........

..

..wD.-

...

...........

..AK.-

........... ...........

..IK...KHI-

...... ....

..PD.H D HSM....... ..T.YE.D N KKN....ET.

......

..Y..M E ES.L...E

..S..MM ..A.LIA

... ..S

Ledy

..MS.TL

..T

.I ..................

Milkweed Cricket Locust Cockroech Termite Dragonfly

Bug

S...ISPSN.

. ..LWl.... ..SLLYMY ..

.IYF ...... ..YF ......

..GSIK

.ILF ....................

....

..N ......... ..T .........

..I..M ........................... ..GT.TM .... ..F .................. ..VS.L .... .VL ................... ..SS.M .... ..L .................... . ..S ...... .I.F ................

FIG.

2.

Inferred

amino

acid

. ..VNK.SL

..TSN.N. ..TNFS ..

.S.LKSL KN.LKSFL*

..IKN.

L...NK.LS. .N..K.YS S . ..L.KN I .N..LKNNN *

..STKK. R....NIKS.

.E..K.LS E .N..Q.MSE A

for the

... ..N ...

....... ... ..

...........

.T.I...L V..I...V

........ ......

.. V..V..TT ...... .V FV.VI ..............

....

..S ......

.N ..L.TSS

......

.. ..I ..... ..T ........... E V.V.TS.S F . ..I ..F . ..I V...I

...... ...... .....

..

CO11

. ..D S ..N ... ..M ... .

IHSUTVPALC VKVDGTPGRL

..T ..... ..I.!5 ..I..T .. L....I.S

..VL....K. ..TI....T.

..TT...PT.

.T.I...L

179

.... ..AN .... .. I.1.A .....

..I.SN

S I.I.......

..I.SS.

..I.AS

..S .. ..S ..

.... ..I.SV . ..I 1.W.S .....

L...AI.S .. ..S ... L...A .....

..[.A ..T.A

..... .....

..[.A

.....

L....I.S

..

..A.A

L....I.S L......S

.. ..

..... 1.A.A ..... I...A ..... 229

l

..NLKN. ..TS..L. ..IST.C.

sequences

.KI...N FKI...T .H....I

..bi.INS ........ P .. ..T.SM .......

..A.....T .HVT ... ..

.. .. ..

..Mll...N. ..CI..F.Y

..TI....T

Flea Ueevi 1 Beetle

.....

..TPI.YKT.

.N . ..IYH.I .M . ..IL..I .. .T.I...L

IRILVTAADV .....

..

....

.... ....

..I

..T ..K

SFN....E

..KKL-

..K.IM..K.lI

..T

90

........ ..D S ..I .... ..... .. .K.I..I ...

. ..IL..L FT.I...L

..I....1

ALSYML..A. M.IAwV..K.

GLFYGGCSEI CGANHSFMPI VIESVPVNYF IKUISNSVNS .I .................... ..ISIKN. .N..N.YSY l ..YF .................. ..ILI.S. ... ..SNS * S1.F ................ .L..ITP.L. LN.VISKA’

Uasp Ant

NQTNFFINRP ..........

..AS . ..P ..S . ..N

IILLFIAMPS LRLLYLLDEI .T.I...L ........... . FN.I...L

F ..L....T ..T ..F...TI

SYNIPTNELS NDGFRLLDVD NRIVLPUNSG

..........

.......... .R .........

HN.TW.1.T. ly

GHGUYUSYEY SDFNNV-EFD ..I- ..... .............

...........

VDAM..I.T. .T.TV...T.

Cockroach Termite

..T.M.

IENIUTILPA ..L...V ...

T V.T .......

F I..TNIN.N L I.....P.H F I.VSMTT.TL I..Y..E

S....DHTN V VLL...I ... .Y .... ..N. II..LMV.S.

RN.LV...T. .... ..T .... .N.LM.I .............. FNNK..I ..V ........... .N.EM...V. Iy).L..M.T.

LMFMLFFNSY VNRFLLHGGL I.IN....K F I...F.V ..N

gene

of 13 species

of insects.

CO11

EVOLUTION

the sequences. This inferred change is from a negative to an uncharged polar amino acid residue. Codon 137, a conserved aspartic acid in other organisms, appears to have conservative replacements of glutamic acid in five of the species. In neither of these cases does the apparent substitution follow obvious phylogenetic patterns. Divergence A total of 468 of the 688 nucleotide sites (229 codons) were variable, including 154 first, 105 second, and 209 third codon positions. Corrected nucleotide and amino acid divergences are given in Table 4. Not surprisingly, divergences between different insect orders are quite high, ranging from about 26% (moth-flea) to over 55% (ant-termite). Perhaps more surprising is the observation that the within order comparisons are also high: about 45% divergence between the two beetle sequences, 37% between the wasp and the ant and 38% between the cricket and the locust. None of these within-order pairs are considered to be closely related pairs, based on traditional taxonomy. Divergences at the amino acid level are also high, ranging from about 23% to over 47%.

IN

DISCUSSION Comparative analysis of aligned homologous DNA sequences from related taxa provides a powerful way of studying evolution at the DNA sequence level. Some of the taxa examined here may have diverged more than 300 million years ago (Wooten, 1981). These long divergence times are reflected in the nucleotide and inferred amino acid sequence divergences, which range from 26 to 55% and 23 to 47%, respectively. Nonetheless, alignment using the translated sequences was unambiguous, except at the site(s) of the two-codon deletions in the wasp and cricket sequences, and about 14 amino acids at the carboxy terminus of the translated sequence (Fig. 2). Because of the difficulties in alignment at the 3’ end of the gene, in the analysis that follows we ignore the 3’ end, from position 640 to the ends of the genes. COII Sequence Evolution Before any attempt can be made to apply sequence data to phylogenetic analysis, it is necessary to examine the details of sequence evolution for the gene. It is well known that transitions occur at a much higher

TABLE Interspecific Drosophila

-

Drosophila Moth Flea Weevil Ladybird beetle Wasp Ant Cricket Locust Cockroach Termite Milkweed bug Dragonfly

23.25 53 30.26 69 35.09 80 37.28 85 38.94 88 39.91 91 23.24 53 32.90 75 28.51 65 33.33 76 32.02 73 30.26 69

Note. Percentage transitions/transversions substitutions.

Moth

Flea

Weevil

26.63 54/99 -

29.54 55/111 25.94 371112 -

40.24 78/134 37.06 681131 43.30 821143 -

25.44 58 36.40 83 34.21 78 35.84 81 38.16 87 20.61 47 37.72 86 28.51 65 32.90 75 29.83 68 31.14 71 DNA

Divergences

35.53 81 38.60 88 37.61 85 39.04 89 28.95 66 36.84 84 35.53 81 36.40 83 32.46 74 35.09 80

sequence are given

36.84 84 40.26 91 42.98 98 37.72 86 42.98 98 37.72 86 41.67 95 36.40 83 38.16 87

divergence above the

Ladybird beetle 40.75 64/151 34.04 621126 38.51 591146 45.41 541147 43.36 98 47.81 109 33.77 77 42.11 96 37.72 86 40.35 92 37.28 85 41.67 95

47

INSECTS

4

in CO11 Gene Sequences in 13 Species of Insects Wasp

Ant

Cricket

Locust

41.29 66/146 37.52 61/136 38.26 55/145 44.98 82/144 43.38 65/155

45.33 691161 40.90 671146 39.89 58/152 50.32 86/163 48.90 75/168 37.18 65/129 -

31.02 72198 27.97 70186 33.79 711111 42.08 811135 35.83 77/114 40.01 78/129 50.16 93/151 -

35.77 731120 41.66 701147 41.15 831132 47.50 881151 48.34 851157 47.73 88/148 51.99 771176 38.71 72/130 -

34.51 78 34.51 78 43.36 98 40.71 92 39.82 90 35.84 81 41.59 94

38.77 88 45.37 103 44.05 100 43.17 98 40.53 92 44.05 100

31.58 72 31.14 71 31.58 72 30.70 70 31.14 71

using the Jukes-Cantor correction diagonal. Below the diagonal are

the

38.77 88 40.53 92 38.77 88 38.33 87 and the inferrred

Cockroach 31.12 70/104 32.23 611119 38.02 641139 42.52 771144 41.51 781141 43.38 81/139 51.18 921159 37.75 841115 39.64 831126

23.68 54 32.90 75 32.75 75 number amino

Termite 37.22 84/116 41.42 861131 44.61 go/139 50.97 981154 48.04 931149 50.01 951149 55.05 961168 43.56 951128 43.47 991125 31.24 96178 32.02 73 34.21 78

Milkweed bug 37.97 741130 35.67 731121 35.16 621129 43.156 831142 43.30 741151 35.12 63/124 45.59 761155 40.72 80/131 43.47 921132 36.61 861112 41.16 951121 33.33 76

Dragonfly 31.29 70/106 35.90 781117 36.58 721125 45.14 931138 45.67 99/135 43.38 80/140 50.32 81/167 39.72 92/115 42.69 82/139 36.37 831114 41.16 1031113 42.44 941128 -

of nucleotide differences in terms acid divergences and the number

of of

48

LIU AND BECKENBACH

frequency than transversions in mitochondrial DNA, but this excess is only evident in comparisons involving very closely related species (Brown, 1985). For more distant comparisons, the occurrence of multiple transition mutations at variable sites and the accumulation of transversions, which erase transition differences (DeSalle et al., 1987; Brown et al., 1982), can eliminate the transition bias. Figure 3 shows the proportion of transitions in pairwise sequence comparisons of the CO11 gene for a broad spectrum of relationships. Data points for divergences of less than 12% are within-group comparisons of the Drosophila obscuru group and comparisons between D. yukubu and members of the D. obscuru species group (Beckenbach et al., submitted for publication). It appears that the transition excess is essentially eliminated in comparisons of sequences that are only 12% diverged. Pair-wise comparisons of sequences presented here (Fig. 1 and Table 4) range from 25 to 55% divergent. The proportion of transitions in these comparisons averages 36% and they appear to be stable throughout this range. To determine if this value is a limiting value, we have extended the comparisons to include those between D. pseudoobscuru and the published CO11 sequences of sea urchin (Jacobs et al., 1988), frog (Roe et al., 1985), cow (Anderson et al., 1982), mouse (Bibb et al., 1981), and human (Anderson et al. 1981). These values (solid dots, Fig. 3), having divergences between 47 and 65%, show no further reduction in the proportion of transitions observed. Holmquist (1983) has analyzed the limiting ratios for transversions to transitions, R,, based on observed base composition. Using his formula 11, we calculate an R, of 2.50, corresponding to a proportion of transi-

tions of 28.6%. The main conclusions from this analysis are that the proportion of observed transitions stabilizes at a value somewhat higher than that predicted solely on the basis of base compositions, presumably because of functional constraints at the protein level, and that this limiting value holds for all divergences greater than 25%. Thus, all comparisons in this study appear to have reached this limiting value. It is clear that transitions cannot contribute phylogenetic information in these comparisons, and it is possible that even transversions have reached, or are approaching, a level of saturation. Greater resolution may be obtained by analyzing the changes according to codon position. Transitions and transversions can have very different effects, depending on codon position. Table 5 gives the base composition and bias by codon position for each of the insect CO11 sequences examined here. Figure 4 summarizes these results. The A + T bias observed in other insect mtDNA sequences occurs at all positions, but is most extreme at the third position. If one assumes that the A + T bias is maintained selectively (16), then one would expect less constraint against synonymous transversions (A c) T) than transitions at the third codon position. If we recalculate R, by codon position, weighted by the number of observed variable sites, the value increases to 3.13, corresponding to a transition frequency of 24.2%. This value is even farther from the observed value (36%) that we obtained by ignoring codon positions. We suggest that the transition excess relative to random expectation, based on base compositions, occurs mostly at the first and second positions, indicating that amino acid substitutions generated by transitions are less severe than those generated by transversions. Phylogenetic

I

OO

I 10

I 20

1 30 Nucleotide

I 40 Divergence

I 50 (%)

I so

I 70

FIG. 3. The proportion of transitions in pairwise comparisons of the COII gene for a broad spectrum of relationships. The percentage nucleotide divergences are corrected using the Jukes-Cantor method. Solid squares indicate the within-group comparisons of the Drosophila obscura group and comparisons between D. yakuba and members of the D. obscura species group (Beckenbach et al., submitted). Open squares indicate the comparisons between all the insects studied here. Solid circles indicate the extended comparisons between the D. pseudoobscura and published urchin, frog, mouse, cow, and human CO11 sequences, and comparisons of the urchin to the mammal sequences.

Inferences

Computer programs will generate trees from sequence data, regardless of whether phylogenetic information is present in the sequences. Trees can be generated by using the complete data set or by using restricted subsets of the data. DeSalle et al. (1987) and Beckenbach et al. (submitted for publication) found that by confining attention to transversion differences, reasonable resolution of relationships for time spans of tens of millions of years can be obtained. Edwards et al. (1991) and Irwin et al. (1991) found that the inclusions of third position changes can obscure relationships in deep phylogenies. Moritz et al. (1987) suggested that resolution of deep phylogenies may be achieved by confining attention to the second codon position differences. They reasoned that second position substitutions result in nonconservative amino acid substitutions, while those arising from first or third position substitutions are often chemically conservative. In view of the apparent saturation in our data set,

CO11

EVOLUTION

TABLE Base Composition

IN

INSECTS

5

at First, Second, and Third

First Soecies Dragonfly Milkweed Locust Cricket Cockroach Termite Drosophila Moth Flea Weevil Ladybird Wasp Ant

bug

beetle

Mean Bias”

Second

G

A

T

25.65 20.00 27.19 19.30 25.33 21.40 23.91 17.47 14.47 17.11 17.90 13.33 12.33

33.04 40.43 29.82 35.09 35.81 38.43 30.00 37.99 36.40 36.40 31.44 46.67 37.89

22.61 22.17 25.88 27.19 22.27 20.96 29.57 27.95 33.33 23.68 36.68 25.33 33.92

19.65

36.11

27.04

C

composition

A

T

18.70 17.39 17.11 18.42 16.59 19.21 16.52 16.59 15.79 22.81 13.97 14.67 15.86

12.66 12.17 14.10 11.45 13.10 12.72 14.41 12.66 12.72 13.16 13.10 10.71 10.57

29.69 29.13 25.99 26.87 26.64 28.95 26.64 29.26 28.51 27.63 27.51 29.91 27.75

37.55 37.83 37.00 40.09 40.61 36.40 41.05 41.48 39.91 39.91 36.68 41.96 41.85

17.20

12.58

28.04

39.41

as C =

(2/3)

$Sllci

First Codon ___... _.__._.__ _.-..-....._..--.-...

E

0.4

2 a a

0.3

Second Codon 1

0.2 0.1 0

0.5

Third Codon

9.4 __.__..__... __.--..

._.._.__....._.. _... .._._ _._.___.__._...

” FIG. 4. CO11 genes

Average at first,

G base second,

A compositions and third

C

G

A

20.09 20.87 22.91 21.59 19.65 21.93 17.90 16.59 18.86 19.30 22.71 17.41 19.82

7.42 1.30 3.96 1.76 1.75 2.19 1.75 0.87 1.32 1.32 3.49 0.89 3.08

45.41 49.57 52.86 44.05 49.78 54.39 45.85 34.50 42.54 41.67 41.92 40.62 37.00

19.97

2.39

44.63

0.233

is calculated

0.5

Third

G

0.175

n Bias in base ith base.

Codon Positions

T of 13 insect codon positions.

C mitochondrial

- 0.251,

w h ere

C is the

T

C

37.55 38.26 31.72 36.56 37.12 21.93 47.16 58.95 53.51 40.35 47.60 47.32 49.34

9.61 10.87 11.45 17.62 11.35 21.49 5.24 5.68 2.63 16.67 6.99 11.16 10.57

42.11

10.87

frequency

of the

0.490 compositional

bias

and

ci is the

not only of transitions, but perhaps of transversions as well, we examine the neighbor-joining trees produced by using each of the following series of subsets of the data: (1) all substitutions; (2) transversions only; (3) inferred amino acid substitutions; and (4) second codon position differences. Edwards et al. (1991) and Irwin et al. (1991) used first and second codon positions to analyze relatively deep phylogenies. This procedure includes some silent first position substitutions and ignores some nonsilent substitutions at third codon positions. By comparing the inferred amino acid sequences, we include all nonsilent differences and ignore all silent nucleotide substitutions. Some amino acid substitutions can arise from a single nonsilent nucleotide substitution, while others require two or even three nucleotide changes. To partially account for these differences, we weight the amino acid substitutions by the minimum number of nucleotide substitutions required, based on the insect mitochondrial code. We compare the resulting trees to those based on consideration of morphology and the fossil record. There are a number of relationships that are widely accepted. Each of the orders included in this study is believed to represent a monophyletic grouping. The orders of holometabolous insects (including Coleoptera, Diptera, Hymenoptera, Lepidoptera, and Siphonaptera, studied here) are also believed to constitute a monophyletic grouping. The Blatteria (cockroaches) are closely related to the Isoptera (termites). The Odonata (dragonflies) should represent an outgroup to the other nine orders included here. Figure 5 gives the neighbor-joining trees resulting from the complete data set and conserved subsets of

LIU AND BECKENBACH

50 A

I

B

cockroach

E ~

locust cockroach termite milkweed

bug

flea

rrzt weevil

1

honaybee wasp ant E

~

fruit

-fruit

fly

I

dragonfly

Iadyblrd

beetle

fly

-dragonfly

D

WWil ladybird

locurt

brrtle

mllkrwd

bug

dragonfly

FIG. 5. Neighbor-joining trees constructed from the nucleotide and inferred amino acid sequences. The published honeybee CO11 sequence is included in the analysis. All trees were obtained by the NEIGHBOR program in PHYLIP 3.4 (Felsenstein, 1989). The trees are based on (A) all nucleotide sites, (B) transversions differences, (C) amino acid sequence divergences, and (D) second codon position divergences. In all trees, the dragonfly is specified as an outgroup.

the data. In all trees, Odonata was specified as an outgroup. The published honeybee CO11 sequence (Crazier et al., 1989) is included in this analysis. All of the trees are different. The only relationships supported by all trees are the close affinities of the cockroaches and termites and the monophyly of the Hymenoptera. Some of the trees fail to support monophyly of some orders. The two Coleoptera, a ladybird beetle and a weevil, are not clustered when all data are used (Fig. 5A) or when only transversions are considered (Fig. 5B). The two members of Orthoptera, a cricket and locust, are separated when only transversions (Fig. 5B) or amino acid sequences (Fig. 5C) are used.

In all trees, there is a mix of the holometabolous with the hemimetabolous orders. The latter are believed to have diverged prior to the radiation of the holometabolous groups. In the trees based on amino acid divergence and second codon positions (Figs. 5C and 5D), the cockroach/termite pair are separated from the holometabolous orders, while the cricket, locust, and milkweed bug are not. It is possible, of course, that these results are due to a polyphyletic origin of the holometabolous insects. Considerably more information will be required before such inferences can be made. We also examined trees based on each of the data subsets, but without specifying an outgroup (trees not shown). Networks that include a number of highly diverged taxa will generally be rooted by the longest branch (Wheeler, 19901, which depends not only on time but also on evolutionary rates. The results showed no improvement in discerning relationships among these orders. It appears from this analysis that these data provide little basis for confirming or rejecting any particular relationships among the orders, or even for confirming monophyly of the orders themselves. The only exceptions are the close relationships of cockroaches with termites and relationships within the order Hymenoptera. The tree supporting the greatest proportion of our preconceived ideas of insect, relationships is that based on the most conserved subset of data, the second codon positions (Fig. 5D). Figure 6 gives bootstrap trees based on second codon positions. Figure 6A is based on a neighbor-joining distance tree, while 6B is a bootstrap using the maximum parsimony tree. The dragonfly was used as the out,group. In both trees, the termite-cockroach cluster is strongly supported. In addition, monophyly of the Hymenoptera is supported in the distance tree, but not, in the tree based on parsimony. The distance tree weakly supports monophyly of the Orthoptera (cricket and locust) and Coleoptera (ladybird beetle and weevil), but only the latter is supported by the parsimony tree. CONCLUSIONS The CO11 gene is a highly conserved protein coding gene in the mitochondrial genome. We initially hoped that sequence comparisons, using the translated sequences or second codon position variation, would provide a basis for inferring the phylogenetic relationships of the holometabolous orders of insects and their positions relative to several hemimetabolous orders. It appears, however, that the divergence of insect orders is so ancient that it is not easily resolved using this gene. More extensive analyses involving many additional taxa, using CO11 or other conserved protein cod-

CO11 EVOLUTION wasp

IN INSECTS

51 dragonfly

B

honeybee

locust

ant

milkweed

milkweed

bug

bug

ant

flea

wasp honeybee flea

ladybird

beetle

.

moth

weevil

fruit fly 30

cricket ladybird

beetle

weevil termite cockroach FIG. 6. Bootstrap trees based on the second codon positions. Both trees were done using 100 replicates with the dragonfly as the outgroup. (A) Bootstrap of distance matrices, using the neighbor-joining method. (B) Parsimony bootstrap using DNABOOT of PHYLIP 3.4 (Felsenstein, 1989).

ing genes, may provide the necessary resolution or answer these questions and will certainly provide a greater understanding of the evolution of the genes.

Brown, W. M., Prager, E. M., Wang, A., and Wilson, A. C. (1982). Mitochondrial DNA sequences of primates: Tempo and mode of evolution. J. Mol. Evol. 18: 225-239. de Bruijn, M. H. L. (1983). Drosophila meZunoguster mitochondrial DNA. a novel acne organization and genetic code. Nature 304: 234-241. ACKNOWLEDGMENTS Cabot, E. L., and Beckenbach. A. T. (1989). Simultaneous editing of multiple nucleic acid and protein sequences with ESEE. Con&. We thank A. Syed for providing some of the insects used in this Appl. Biosci. 5: 233-234. study. K. A. Beckenbach, M. J. Smith, and two anonymous reviewers . Clary, D. O., and Wolstenholme, D. R. (1985). The mitochondrial provided valuable comments on this paper. E. G. Boulding provided DNA molecule of Drosophila yakuba: Nucleotide sequence, gene assistance with the PHYLIP programs. This work was funded by an organization and genetic code. J. Mol Evol. 22: 252-271. operating grant from NSERC of Canada (To A.T.B.) and a Special Crozier, R. H., Crozier, Y. C., and Mackinlay, A. G. (1989). The CO-I Program grant from SFU. and CO-II region of honeybee mitochondrial DNA: Evidence for variation in insect mitochondrial evolutionary rates. Mol. Biol. Evol. 6: 399-411. REFERENCES DeSalle, R., Freedman, T., Prager, E. M., and Wilson, A. C. (1987). Tempo and mode of sequence evolution in mitochondrial DNA of Anderson, S., Bankier, A. T., Barrell, B. G., de Bruijn, M. H. L., Hawaiian Drosophila. J. Mol. EvoZ. 26: 157-164. Coulson, A. R., Drouin, J., Eperon, I.C., Nierlich, D. P., Roe, Edwards, S. V., Arctander, P., and Wilson, A. C. (1991). MitochonB. A., Sanger, F., Schreier, P. H., Smith, A. J. H., Staden, R., drial resolution of a deep branch in the genealogical tree for perchand Young, I. G. (1981). Sequence and organization of the human ing birds. PFOC.R. Sot. Lond. B 243: 99-107. mitochondrial genome. Nature 290: 457-465. Felsenstein, J. 1989. PHYLIP-Phylogeny inference package (verAnderson, S., de Bruijn, M. H. L., Coulson, A. R., Eperon, I. C., sion 3.2). Cludistics 5: 164-166. Sanger, F. and Young, I. G. (1982). Complete sequence of bovine Hennig, W. (1981). “Insect Phylogeny,” Wiley, New York. mitochondrial DNA. J. Mol. Biol. 158: 683-717. Holmquist, R. (1983). Transition and transversions in evolutionary Beckenbach, A. T., Wei, Y. W., and Liu, H. Evolution of the mitodescent: An approach to understanding. J. Mol. Evol. 19: 134-144. chondrial cytochrome oxidase II gene in the Drosophila obscu~u species group. Submitted for publication. Irwin, D. M., Kocher, T. D., and Wilson, A. C. (1991). Evolution of the cytochrome B gene of mammals. J. Mol. Evol. 32: 128-144. Bibb, M. J., Van Etten, R. A., Wright, C. T., Walberg, M. W., and Jacobs, H. T., Elliott, D. J., Math, V. B., and Farquharson, A. (1988). Clayton, D. A. (1981). Sequence and gene organization of mouse Nucleotide sequence and gene organization of sea urchin mitomitochondrial DNA. Cell 26: 167-180. chondrial DNA. J. Mol. Biol. 202: 185-217. Boudreaux, H. B. (1979). “Arthropod Phylogeny: With Special Reference to Insects,” Wiley, New York. Jukes, T. H., and Cantor, C. R. (1969). Evolution of protein molecules. In “Mammalian Protein Metabolism” (H. N. Munro, Ed.), Brown, W. M. (1985). The mitochondrial genome of animals. In “MoVol. 3, pp. 21-126, Academic Press, New York. lecular Evolutionary Genetics” (R. J. MacIntyre, Ed.), pp. 95-130, Kimura, M. (1980). A simple method for estimating evolutionary Plenum, New York.

52

LIU AND BECKENBACH

rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. EuoZ. 16: 111-120. Kristensen, N. P. (19811. Phylogeny of insect orders. Annu. Rev. Entomol. 28: 135-157. Millett, F., de Jong, C., Paulson, L., and Capaldi, A. (1983). Identification of specific carboxylate groups on cytochrome c oxidase that are involved in binding cytochrome c. Biochemistry 22: 546-552. Moritz, C., Dowling, T. E., and Brown, W. M. (1987). Evolution of animal mitochondrial DNA: Relevance for population biology and systematics. Annu. Rev. Ecol. Syst. 18: 269-292.

Roe, B. A., Ma, D., Wilson, R. K., and Wong, J. H. (1985). The complete nucleotide sequence of the Xenopus laeuis mitochondrial genome. J. Biol. Chem. 266(17): 9759-9774. Saitou, N., and Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. EuoZ. 4: 406-425.

Wheeler, W. C. (1990). Nucleic acid sequence phylogeny and random outgroups. Cladistics 6: 363-367. Wooten, R. J. (1981). Paleozoic insects. Annu. Rev. Entomol. 26: 319-344.

Evolution of the mitochondrial cytochrome oxidase II gene among 10 orders of insects.

We examine the complete nucleotide sequences of the mitochondrial cytochrome oxidase II gene of 13 species of insects, representing 10 orders. The gen...
1MB Sizes 0 Downloads 0 Views