The complete mitochondrial genome of the endangered Apollo butterfly, Parnassius apollo (Lepidoptera: Papilionidae) and its comparison to other Papilionidae species Yan-hong Chen, Dun-yuan Huang, Yun-liang Wang, Chao-dong Zhu, Jia-sheng Hao PII: DOI: Reference:
S0014-4886(14)00197-6 doi: 10.1016/j.expneurol.2014.06.010 YEXNR 11762
To appear in:
Experimental Neurology
Please cite this article as: Chen, Yan-hong, Huang, Dun-yuan, Wang, Yunliang, Zhu, Chao-dong, Hao, Jia-sheng, The complete mitochondrial genome of the endangered Apollo butterfly, Parnassius apollo (Lepidoptera: Papilionidae) and its comparison to other Papilionidae species, Experimental Neurology (2014), doi: 10.1016/j.expneurol.2014.06.010
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Dun-yuan1,2,3,
Wang
Yun-liang1,
RI P
Chen Yan-hong1, Huang Chao-dong2, Hao Jia-sheng1
T
The complete mitochondrial genome of the endangered Apollo butterfly, Parnassius apollo (Lepidoptera: Papilionidae) and its comparison to other Papilionidae species Zhu
MA NU
SC
1. Laboratory of Molecular Evolution and Biodiversity, College of Life Sciences, Anhui Normal University, Wuhu, P.R. China 2. Institute of Zoology, Chinese Academy of Sciences, Beijing, P. R. China 3. College of Forestry, Jiangxi Environmental Engineering Vocational College, Ganzhou, P. R. China
Abstract
AC
CE
PT
ED
The Apollo butterfly, Parnassius apollo is a representative species of the butterfly subfamily Parnassiinae. This charming species is one of the most endangered butterfly species in the world. In this study, we sequenced its complete mitochondrial genome (mitogenome), with the aim of accumulating genetic information for further studies of population genetics and mitogenome evolution in the Papilionidae. The 15,404-bp long mitogenome harbors a typical set of 37 genes and is the largest butterfly mitogenome determined, except for P. maraho (16,094 bp). Like many other sequenced lepidopteran species, one tRNATrp-like and one tRNALeu(UUR)-like sequences were detected in the AT-rich region. A total of 164 bp of non-coding sequences are dispersed in 14 regions throughout the genome. The longest intergenic spacer (68 bp) is located between tRNASer(AGN) and tRNAGlu, and is the largest spacer at this location among Papilionidae species. This spacer may have resulted from an 8-fold repetition of a TTTCTTCT motif or a 4-fold repetition of a CTTTATTT motif. Key words: Parnassiinae; Parnassius apollo; Mitochondrial genome; tRNA-like sequence; Intergenic spacer
Introduction The Apollo butterfly, Parnassius apollo is distributed mainly in the mountainous areas of Europe and northwest of China. This beautiful and charming species is one of the largest Parnassius butterfly species, with the wingspan about 70 mm (Carter, 2000). Its adults are decorated with large black eye-spots on the forewings and red eye-spots on the hind wings. The size and color of the striking eye-spots can change
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
MA NU
SC
RI P
T
as their habitats vary, and a variety of subspecies have evolved in different areas. Mainly owing to the over-collection, habitat loss, the destruction of its host plant (Sedum and Sempervivum species) and climate change, the Apollo butterfly is now becoming endangered in some of its habitats (Collins, 1985). It consequently has been assessed as vulnerable (VU) in Appendix II in CITES (Collins, 1985; Still, 1996) and IUCN Red List (Gimenez, 1996), as well as listed as Grade-II protected by the Chinese government. The typical insect mitochondrial genome (mitogenome) has a circular structure about 15-16 kb long, and contains 37 genes, including 13 protein coding genes (PCGs), 22 transfer RNA (tRNA) genes, 2 ribosome RNA (rRNA) genes and and a non-coding region (i.e., the control region or the AT-rich region) (Wolstenholme, 1992; Boore, 1999). In view of its maternal inheritance and strict orthology, lack of recombination and an accelerated evolutionary rate compared to nuclear genome, mitogenome has become popular in comparative and evolutionary genomics, molecular evolution, phylogenetics, and population genetics (Nardi et al., 2003, 2005; Simon et al., 2006; Cameron, 2014). Up to the present, 45 complete or nearly complete mitogenome sequences of true butterflies (superfamily: Papilionoidea) have been reported including 8 species from the family Papilionidae. Within the subfamily Parnassiinae, there are two complete mitogenome sequences (Kim et al., 2009; Ji et al., 2012). Thus, more mitogenomic data of representative species, especially of endangered species, are very important for the studies of Papilionoidae phylogeny and ecology. In this study, the complete mitogenome sequence of P. apollo was determined using the long PCR and the conserved primer walking methods, and the sequence was analyzed to determine gene arrangement, nucleotide composition and secondary structures, with the aim of providing key molecular data for the studies of its population genetics, conservation biology, molecular ecology, historical biogeography, etc. Furthermore, the mitogenome sequence was compared with those of other Papilionidae species available, to improve our understanding of molecular evolution within the Papilionidae mitogenomes.
Materials and methods Sample collection and DNA extraction An adult individual of P. apollo was collected in Mountain Tianshan, Xinjiang Province, China in July 2012. After collection, the sample was preserved in 100% ethyl alcohol immediately and stored at -20℃ before DNA extraction. The total genomic DNA was extracted from thorax muscle using a DNA extraction kit (Sangon, China) according to the manufacturer’s instruction.
ACCEPTED MANUSCRIPT PCR amplification and sequence determining
AC
CE
PT
ED
MA NU
SC
RI P
T
To sequence the full-length mitogenome of P. apollo, 13 pairs of primers for the amplification of 6 short fragments and 7 long fragments were used (Fig. 1). Three short fragments (SF2, SF4, SF5) and one long fragment (LF4) were amplified using universal PCR primers from Caterino and Sperling (1999), Simmons and Weller (2001), and Zhao et al. (2013). Other primers including the AT-rich region were designed by the multiple sequence alignments of the known mitochondrial sequences of the lepidopteran species, using Clustal X 1.83 (Thompson et al., 1997) and Primer Premier 5.0 software (Singh et al., 1998) (Table 1).
Fig. 1. Circular map of the mitochondrial genome of Parnassius apollo. The abbreviations for the genes are as follows: COI, COII, and COIII refer to the cytochrome oxidase subunits, CytB refers to cytochrome B, ATP6 and ATP8 refer to subunits 6 and 8 of F0 ATPase, and ND1-6 refers to components of NADH dehydrogenase. tRNAs are indicated by the IUPAC-IUB single letter amino acid codes, while L1, L2, S1, S2 denote tRNALeu(CUN), tRNALeu(UUR), tRNASer(AGN) and tRNASer(UCN), respectively. Gene names that are not underlined indicate transcription on the majority strand whereas underlines indicate transcription on the minority strand. The P.apollo mitogenome was sequenced by 6 short fragments (SF1-SF6) and 7 long fragments (LF1-LF7) as templates, shown as single lines within a circle.
ACCEPTED MANUSCRIPT
Table 1 List of primers used to amplify and sequence the mitogenome of Parnassius apollo. Fragment name
Directione
Primer name
Sequence(5’- 3’)
Nucleotide
ND2-Fa
F
CGTTCATTTCTATTTCAGC
a
R
ACACCACCTATTGTTCCTA
F
ND2-R
k807
b
COIII-F
SF5
SF6
2
718-736
1
TACAATTTATCGCCTAAACTTCAGCC
1699-1721
3
R
TGAAAATGAGCTACAACATAATA
2548-2570
0
a
F
ATCTCAATGATGACGAGAT
4859-4878
1
a
COIII-R SF4
298-316
SC
SF3
k698
R
CAAATCCAAAATGGTGAGTA
5386-5405
3
c
F
AAAACTTCCAGAAAATAATCTC
6786-6807
5
c
ND5-R
R
TTGCTTTATCTACTTTAAGACA
7261-7282
1
REVCB2Hd
F
TGAGGACAAATATCATTTTGAGGW
10895-10918
1
REVCBJd
R
ACTGGTCGAGCTCCAATTCATGT
11498-11520
3
lrRNA-Fa
F
TACGCTGTCATCCCTAA
12976-12992
1
a
R
AAGTCTAATCTGCCCAC
13337-13353
0
CCCTTTCATTTCTGATTCC
564-582
1
ACTGTTCGTCCTGTTCCT
1803-1820
2
TCACAAGAAAGTGGAAAA
2212-2229
0
TCTCTCATCGTAAGCCT
4929-4945
3
GCTGATAGTATTTATGGTTC
5263-5282
0
TTGTATGTGCTGGAGTT
7152-7168
1
F
AATTATACCAGCACATAT
7149-7166
3
R
TTATCGACTGCAAATC
10992-11007
2
F
TCCTGCTAACCCTTTAGTCA
11263-11282
3
ND5-F
lrRNA-R
Annealing temperature(°C)
MA NU
SF2
b
RI P
Short fragments SF1
f
T
position
Mismatchg
47.3
46.9
46.8
46.5
45.0
47.5
ND2-COI-Fa
F
a
ND2-COI-R LF2
COI-COIII-F
R a
F
a
R
a
F
a
R
COI-COIII-R LF3
COIII-ND5-F
LF4
ND5-CytB-F
CE
COIII-ND5-R c
c
ND5-CytB-R
a
AC
LF5
CytB-lrRNA-F
a
CytB-lrRNA-R LF6
lrRNA-srRNA-F
R
GAGTATTTTGTTGGGGT
13082-13098
0
a
F
CTGGGGTCTTCTCGTCT
13158-13174
2
a
R
GCAATAAGTTGGCGGTA
14495-14511
3
F
GAAACACTTTCCAGTACCT
14139-14157
0
330-348
1
lrRNA-srRNA-R LF7
srRNA-ND2-F
a a
PT
LF1
ED
Long fragments
srRNA-ND2-R R CTAAACCAATTCAACATCC Primers newly designed for this genome. b Primers from Caterino and Sperling (1999). c Primers from Zhao et al. (2013). d Primers from Simmon and Weller (2001). e F and R, forward and reverse direction of transcription. f Nucleotide positions are with respective to Parnassius apollo mitogenome. g Mismatches are with respective to Parnassius apollo mitogenome. a
Short-fragment PCR was implemented using Taq DNA polymerase (Takara Bio, Otsu, shiga, Japan) and each PCR reaction was performed in 50µL mixture (10×Buffer 6.0µL, 2.5mmol/L MgCl2 8.0µL, 0.2µg/µL BSA 5.0µL, 2.5mmol/L dNTPs 1.5µL, 0.1µmol/L primers (both direction) 1.8µL, 1.0units Taq DNA
51.1
47.2
47.7
47.1
49.2
51.6
49.7
ACCEPTED MANUSCRIPT
MA NU
SC
RI P
T
polymerase and 1.5µL of the template DNA). The cycling parameters were as follows: 1 min at 94°C; followed by 35 cycles of 1 min at 94°C,1 min at 45°C ~48°C and 2~2.5 min at 72°C; final elongation for 10 min at 72°C. Long-fragment PCR was implemented using LA Taq DNA polymerase (Takara Bio, Otsu, shiga, Japan) and each PCR reaction were also performed in 50µL mixture (10×LA PCR Buffer I (Mg2+ plus) 5µL, 2.5mmol/L MgCl2 3µL, 2.5mmol/L dNTP Mix 8.0µL, 0.1µmol/L primers (both direction) 1.5µL, 1.0units LA Taq DNA polymerase and 1.5µL of the template DNA). The relevant PCR parameters were as follows: 5 min at 95°C; followed by 30 cycles of 55s at 95°C, 2 min at 47~52°C and 2~2.5 min at 68°C; final elongation for 10 min at 68°C. The PCR products were separated by electrophoresis in a 1.2% agarose gel and purified using the DNA gel extraction kit (TaKaRa). All PCR fragments were directly sequenced after purification with the QIA quick PCR Purification Kit reagents (QIAGEN). All fragments were sequenced by primer walking from double strands. Data analysis
AC
CE
PT
ED
Sequences obtained were assembled and data regarding the sequence were determined using BioEdit version 4.8.9 (Hall, 1999). PCGs and rRNA genes were identified by comparing their similarity to published insect mitochondrial sequences using ClustalX 1.8 (Thompson et al., 1997) and MEGA 5.0 (Tamura et al., 2011). Both the lrRNA and srRNA predicted secondary structures were drawn according to models proposed for these genes in other insects (Gillespie et al., 2006; Cameron and Whiting, 2008). The proposed secondary structures of the tRNA genes were predicted with the aid of tRNAscan-SE 1.21 using invertebrate codon predictors and a cove score cut-off of 1 (Lowe and Eddy, 1997). The tRNAs not found by tRNAscan-SE were identified through comparison of P. apollo nucleotide sequence with the regions coding these tRNAs in other insects. All tRNAs were folded by hand, using tRNAscan-SE output as template when possible. Nucleotide composition was calculated using Mega 5.0 (Tamura et al., 2011). The bias in nucleotide composition can be measured as AT-skew and GC-skew ((A%-T%)/(A%+T%) and (G%-C%)/(G%+C%), respectively) (Perna and Kocher, 1995). The AT-rich regions were determined via the alignment of the sequences with homologous regions of known full-length insect mitogenome sequences and the tandem repeats in the AT-rich region were predicted by the Tandem Repeats Finder available online (http://tandem.bu.edu/trf/trf.html) (Benson, 1999). The complete mtDNA sequence of P. apollo was deposited in GenBank under accession no. KF746065.
Results and Discussion Gene structure, organization and composition
ACCEPTED MANUSCRIPT
PT
ED
MA NU
SC
RI P
T
The P. apollo mitogenome contains typical 37 genes for insects: 13 PCGs, 2 rRNA genes, 22 tRNA genes, and one non-coding AT-rich region (control region) (Fig. 1, Table 2). Like many other insect mitogenomes, the major strand codes for 23 genes (9 PCGs and 14 tRNAs) and the AT-rich region, while the minor strand codes for the remaining 14 genes (4 PCGs, 8 tRNAs and 2 rRNA genes). Though the genome size is nearly the same with that of available congeneric mitogenome, P. bremeri (15,389 bp in size), the 15,404-bp long genome is the second largest of the Papilionidae butterflies, after P. maraho (16,094 bp in size) (Table 3).The gene order and orientation are similar to that found in the inferred ancestral hexapod (Boore et al., 1998; Crease, 1999), with the exception of the arrangement of tRNAs between the AT-rich region and the ND2 gene. This type of arrangement (M-I-Q) is found in nearly all the lepidopterans, whereas the insect ground plan arrangement is I-Q-M (Taylor et al., 1993; Cao et al., 2012; Cameron, 2014). As is the case in other Papilionidae butterflies, the nucleotide composition of the entire P. apollo mitogenome is significantly biased, with the highest A+T content (81.3%) of Papilionidae species the same as that of P. bremeri (Table 3). The overall AT- and GC- skews of the P. apollo mitogenome (measured on the majority strand) are -0.016 and -0.187, respectively, indicating that more Ts and Cs than As and Gs are used (Table 3). This is similar to the skew statistics of other Papilionidae butterfly species which have negligible AT-skew values (-0.040 to 0.006) and moderate GC-skew values (-0.262 to -0.191).
CE
The protein coding genes
AC
The P. apollo mitogenome harbors 13 protein coding genes (PCGs), which collectively harbor 3,720 codons, exclusive of the termination codons. Codon number is identical to that of P. machaon. Of the 13 PCGs, nine are encoded on the J-strand (ATP6, ATP8, COI, COII, COIII, CytB, ND2, ND3, ND6), while the other four are encoded on the N-strand (ND1, ND4, ND4L, ND5). All PCGs are initiated by typical ATN codons (ATP8, ND2, ND3 and ND5 with ATT; ATP6, COII, COIII, CytB, ND1, ND4 and ND4L with ATG; ND6 with ATC), except COI gene utilizes CGA as a start codon. In the COI gene, no canonical ATN initiator was found in the start site. The only plausible traditional start codon for the COI gene is ATC, located within the tRNATyr gene, overlapping 25 bp with the tRNATyr. This ATC sequence requires nine additional amino acids, resulting in a peculiar alignment as compared with other lepidopteran species. However, a codon following this triplet has a stop codon (TAG) which is present at the beginning region of the COI gene. Consequently, this ATC sequence may not be the start codon for the COI gene, and there are no other probable start codons for P.apollo COI. According to these criteria, the first nonoverlapping codon in the COI gene is the CGA, designating arginine existing in a highly conserved region in most lepidopteran
ACCEPTED MANUSCRIPT insects (Cameron and Whiting, 2008; Hao et al., 2013).
Position
Size(bp)
Intergenic length
Anticodon
F
1-69
69
0
lle
tRNA
F
70-133
64
0
tRNAGln
R
131-199
69
-3
ND2
F
240-1253
1014
40
tRNATrp
F
1253-1318
66
-1
Cys
R
1311-1376
66
-8
Tyr
tRNA
R
1381-1444
64
4
COI
F
1447-2977
1531
2
F
2978-3044
67
tRNA
Leu
tRNA
(UUR)
Stop codon
33-35 CAT
-
-
99-101 GAT
-
-
167-169 TTG
-
-
-
ATT
TAA
1283-1285 TCA
-
-
1345-1347 GCA
-
-
1412-1414 GTA
-
-
-
CGA
T-tRNA
0
3008-3010 TAA
-
-
SC
tRNA
Start codon
T
Direction Met
MA NU
Gene
RI P
Table 2 Organization of the Parnassius apollo mitogenome.
3045-3726
682
0
-
ATG
T-tRNA
F
3727-3797
71
0
3757-3759 CTT
-
-
Asp
tRNA
F
3797-3863
67
-1
3828-3830 GTC
-
-
ATP8
F
3864-4025
162
0
-
ATT
TAA
ATP6
F
4019-4696
678
-7
-
ATG
TAA
F
4696-5484
789
-1
-
ATG
TAA
tRNA
F
5488-5554
67
3
5518-5520 TCC
-
-
ND3
tRNA
COIII
5555-5908
354
0
-
ATT
TAA
F
5908-5973
66
-1
5938-5940 TGC
-
-
Arg
F
5973-6038
66
-1
6002-6004 TCG
-
-
Asn
F
6039-6105
67
0
6069-6071 GTT
-
-
Ser
tRNA (AGN)
F
6109-6169
61
3
6030-6032 GCT
-
-
Glu
F
6238-6303
66
68
6269-6271 TTC
-
-
Phe
R
6302-6367
66
-2
6333-6335 GAA
-
-
R
6369-8102
1734
1
-
ATT
TAA
R
8103-8167
65
0
8135-8137 GTG
-
-
R
8167-9507
1341
-1
-
ATG
TAA
tRNA tRNA tRNA
tRNA
AC
tRNA
CE
F Ala
PT
Gly
ED
F Lys
COII
ND5 His
tRNA ND4
R
9507-9797
291
5
-
ATG
TAA
Thr
F
9800-9866
67
2
9832-9834 TGT
-
-
Pro
tRNA
R
9867-9931
65
0
9989-9991 TGG
-
-
ND6
F
9934-10464
531
2
-
ATC
TAA
CytB
F
10481-11629
1149
16
-
ATG
TAA
tRNASer(UCN)
F
11631-11697
67
1
11660-11662 TGA
-
-
ND1
R
11714-12652
939
16
-
ATG
TAG
R
12654-12722
69
1
12690-12692 TAG
-
-
R
12723-14064
1342
0
-
-
-
tRNA
R
14065-14129
65
0
14097-14099 TAC
-
-
srRNA
R
14130-14900
771
0
-
-
-
14901-15404
504
0
-
-
-
ND4L tRNA
Leu
tRNA
(CUN)
lrRNA Val
AT-rich region
F = foward; R = reverse.
ACCEPTED MANUSCRIPT
Table 3 Characteristics of the mitogenomes of Papilionidae species. Taxon
PCGb
Mitogenome (majority strand)
tRNA
rRNA
AT-rich region
Genbank
References
Size
AT
AT-skew
GC-skew
No. codons
a
Size
AT
Size
AT
Size
AT
(%)
(bp)
(%)
(bp)
(%)
(bp)
(%)
(%)
Papilio bianor
15,340
80.6
-0.015
-0.210
3,719
79.0
1,453
Papilio maraho
16,094
80.5
0.006
-0.262
3,717
78.1
1,442
Papilio maackii
15,357
80.7
-0.014
-0.212
3,721
79.2
1,452
Papilio machaon
15,185
80.3
-0.031
-0.198
3,720
79.0
Troides aeacus
15,263
80.2
-0.040
-0.232
3,724
Teinopalpus aureus
15,242
79.9
-0.005
-0.238
3,719
Parnassius apollo
15,404
81.3
-0.016
-0.187
3,720
Parnassius bremeri
15,389
81.3
-0.011
-0.191
3,722
15,242
80.9
-0.009
-0.221
Papilioninae
3,691
81.4
2,097
84.2
498
94.0
NC018040
Xu et al., unpublished
80.7
2,112
84.4
1,270
94.3
FJ810212
Wu et al. (2010)
81.4
2,100
84.3
514
92.8
NC021411
Dong et al. (2013)
1,446
81.4
2,092
83.8
362
92.5
HM243594
Xu et al., unpublished
79.0
1,472
80.6
2,018
83.9
419
89.8
EU625344
Jiang et al., unpublished
78.3
1,455
81.2
2,101
83.7
395
93.2
HM563681
Qin et al. (2012)
80.1
1,460
81.5
2,113
84.5
504
93.8
KF746065
This study
80.2
1,462
80.9
2,117
84.4
504
93.7
FJ871125
Kim et al. (2009)
79.8
1,457
81.6
2,097
83.9
408
94.1
HQ259122
Ji et al. (2012)
MA NU
ED
a
SC
Papilionidae
Sericinus montela
no.
AT
(bp)
Parnassiinae
RI P
T
accession
PT
Termination codons were excluded in total codon count. b Protein coding genes. Bar (-) indicates lack of sequence information on the AT-rich region in the genome.
AC
CE
Eleven genes have complete termination codons, either TAA (ATP6, ATP8, COIII, CytB, ND2, ND3, ND4, ND4L, ND5, ND6) or TAG (ND1), while the remaining two genes (COI and COII) end with the incomplete termination codon T. This phenomenon of partial termination codons (i.e., T or TA) is observed in all sequenced lepidopteran insects and has been interpreted in terms of posttranscriptional polyadenylation, by which “A” residue(s) are added to create TAA terminator (Kim et al., 2009). The relative synonymous codon usage (RSCU) analysis demonstrated that codons with As or Ts at the third position are always overused compared to other synonymous codons. For example, the codon TTG (Leu) is utilized only twice per 1000 codons, corresponding to an RSCU of 0.07, but its synonymous codon TTA (Leu) is significantly overused (134 per 1000), corresponding to an RSCU of 5.42 (Table 4). This trend has also been noted in all other insect mitogenomes, thus indicating the fact of universally biased usage of A and T nucleotides in the PCGs (Cameron & Whiting, 2007; Nelson et al., 2012). In addition, NC1000 statistics showed that TTT (Phe), TTA (Leu), ATT (Ile), ATA (Met), and AAT (Asn) are the five most frequently used, accounting for 48.36% of all the codons (Table 4). Similar cases have been detected in other lepidopterans (data not shown).
ACCEPTED MANUSCRIPT
Table 4 Relative synonymous codon usage (RSCU) and number of codons per 1000 codons (NC1000) in the protein-coding genes of the Parnassius apollo mitogenome. acid
M
V
NC1000
1.84
90.86
TCT
2.82
30.38
TTC
0.16
7.80
TCC
0.27
2.96
TTA
5.42
133.87
TCA
2.24
24.19
TTG
0.07
1.61
TCG
0.05
0.54
CTT
0.27
6.72
CCT
2.40
19.35
CTC
0.02
0.54
CCC
0.37
2.96
CTA
0.22
5.38
CCA
1.23
9.95
CTG
0.00
0.00
CCG
0.00
0.00
ATT
1.92
116.94
ACT
2.36
24.46
ATC
0.08
5.11
ACC
0.21
2.15
ATA
1.87
76.61
ACA
1.40
14.52
ATG
0.13
5.38
ACG
0.03
0.27
GTT
2.05
16.13
GCT
2.42
19.35
GTC
0.09
0.81
GCC
0.24
1.88
GTA
1.83
15.86
GCA
1.34
10.75
S*
Amino
Y
codon
RSCU
NC1000
TAT
1.94
49.46
TAC
0.06
1.61
Amino
codon
RSCU
NC1000
TGT
1.82
8.06
TGC
0.18
0.81
TGA
1.98
25.27
TGG
0.02
0.27
CGT
1.31
4.84
acid
C
W
CAT
1.92
18.28
CAC
0.08
0.81
CGC
0.15
0.54
CAA
1.97
16.13
CGA
2.33
8.60
CAG
0.03
0.27
CGG
0.22
0.81
AAT
1.87
65.32
AGT
0.67
7.26
AAC
0.13
4.57
AGC
0.02
0.27
AAA
1.92
25.81
AGA
1.92
20.7
AAG
0.08
1.08
AGG
0.00
0.00
GAT
1.90
15.59
GGT
1.06
13.98
GAC
0.10
0.81
GGC
0.04
0.54
GAA
1.87
19.35
GGA
2.52
33.33
0.03 0.27 0.00 0.00 0.13 GTG GCG GAG Termination codons were excluded to the count due to the uncertainty in many species.
1.34
GGG
0.39
5.11
P
T
A
H
Q
MA NU
I
RSCU
acid
PT
L*
TTT
N
K
D
E
R
S
G
The intergenic spacer sequences The P. apollo mitogenome includes a total of 164 bp of intergenic spacer regions, spread over 14 non-coding regions, ranging from 1 to 68 bp. The four longest spacers are located between tRNAGln and ND2 (40 bp), tRNASer(AGN) and tRNAGlu (68 bp), ND6 and CytB (16 bp), tRNASer(UCN) and ND1 (16 bp) (Table 2). The tRNAGln ND2 spacer has been detected in most other Papilionidae species, with a size range of 40 to 72 bp. This sequence is 70% homologous to its neighboring ND2 gene, suggesting that this sequence may be derived from ND2 (Fig. 2). The Ser - Glu spacer is a unique to the genus Parnassius, and in P. bremeri the corresponding region is 43 bp long (Kim et al., 2009). This region sequence appears to be the result of an 8-fold repetition of TTTCTTCT motif or a 4-fold repetition of a CTTTATTT motif (Fig. 2). The third one (16 bp) show a low level of sequence similarity compared with its neighboring ND6 and CytB genes, showing significant variations in length among the sequenced lepidopterans (data not shown). Finally, the last one (16 bp) was detected in all sequenced Papilionidae butterflies, which harbor an ATACTAA motif (Fig. 3). Due to this intergenic spacer sequences is located at the end site of the major-strand coding region, this 7-bp sequence was suggested to be the possible binding site for mtTERM, the transcription termination peptide (Cameron and Whiting, 2008; Taanman, 1999).
CE
L
codon
acid
AC
F
Amino
T
NC1000
RI P
RSCU
SC
codon
ED
Amino
SC
RI P
T
ACCEPTED MANUSCRIPT
CE
PT
ED
MA NU
Fig. 2. Sequences of two relatively large intergenic spacers. (A) Alignment of the spacer sequence located between tRNAGln and ND2 gene, and the neighboring partial ND2 gene of Parnassius apollo. Asterisks indicate consensus sequences in the alignment between the spacer sequence and the ND2 gene. Sequence homology is shown on the right side of the alignment. (B) The intergenic spacer sequence detected between the tRNASer(AGN) and tRNAGlu of Parnassius apollo (68 bp), and the alignment of repeat sequences detected within the intergenic spacer sequence. The nucleotide position is indicated at the beginning and end sites of the sequence.
AC
Fig. 3. Alignment of the intergenic spacer between the tRNASer(UCN) and ND1 among all sequenced Papilionidae butterflies, with the ATACTAA motif shown by shadow area.
The tRNA genes
There are 22 tRNA genes in the P. apollo mitogenome, ranging in length from 61 bp (tRNASer (AGN)) to 71 bp (tRNALys) (Table 2). The nucleotide composition of these 22 tRNA genes (1,460 bp in total size) is AT biased (81.5%). All of them possess the typical clover-leaf secondary structures, with the exception of tRNASer (AGN), which lacks a dihydrouridine (DHU) arm (Fig. 4). Similar cases have been detected in most insects including all lepidopterans studied to date (Wolstenholme, 1992; Salvato et al., 2008; Hu et al., 2010; Wang et al., 2011; Sun et al. 2012; Zhao et al., 2013). All P. apollo tRNAs possess 7 bp aminoacyl stems, 7 bp anticodon loop and 5 bp anticodon stems, but other portions of tRNAs are variable in length, particularly within the DHU and TΨC loops (4-7 bp and 3-9 bp, respectively). A total of 11 pair mismatches are found in all the tRNA stem regions: 6 U-Us, 2 A-Cs, 2 U-Cs, and one
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
MA NU
SC
RI P
T
G-A (Fig. 4).
Fig. 4. Predicated clover-leaf secondary structure of the Parnassius apollo tRNA genes.
The rRNA genes Like all other insect mitogenome sequences, two rRNA genes (1,342 bp lrRNA and 771 bp srRNA) are found in the P. apollo mitogenome. They are located between
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
MA NU
SC
RI P
T
tRNALeu(CUN) and tRNAVal, and between tRNAVal and the AT-rich region (Table 2), with an A+T% of 84.1% and 85.3%, respectively. Both of these values are well within the range reported for other lepidopteran insects (Salvato et al., 2008). In predictive secondary structures, lrRNA contained six domains (labeled I, II, III, IV, V, and VI) with 49 helices, while the srRNA harbored three domains (labeled I, II, III) with 33 helices (Figs. 5 and 6), respectively. The morphological characteristics of both lrRNA and srRNA are quite similar to their counterparts in A. mellifera, and M. sexta (Gillespie et al., 2006; Cameron and Whiting, 2008).
Fig. 5. Predicted secondary structure of the Parnassius apollo lrRNA. Roman numerals denote the conserved domain structure. Helices are numbered according to the annotation systems of Gillespie et al. (2006). Tertiary structures are denoted by boxed bases joined by solid lines. Watson-Crick pairs are joined by dashes, other interactions are joined by plus signs.
AC
CE
PT
ED
MA NU
SC
RI P
T
ACCEPTED MANUSCRIPT
Fig. 6. Predicted secondary structure of the Parnassius apollo srRNA. The annotation is the same as in Fig. 5.
The AT-rich region The 504 bp AT-rich region of P. apollo mitogenome is located between srRNA and tRNAMet, and shows a relatively high level of A+T content (93.8%); well within the range of other Papilionidae species 92.5% in P. machaon to 94.3% in P. maraho (Table 3). The region is composed mostly of non-repetitive sequences, but harbors some typical structures characteristic of lepidopterans: the putative ON (Origin of minority or light strand replication) located 22 bp upstream of the 5'-end of the srRNA gene, and is contains of the motif ATAGA followed by an 17 bp poly-T stretch; and a microsatellite-like repeat (TA)9 preceded by the ATTTA motif (Fig. 7). Another (TA)9 microsatellite repeat located 126 bp upstream of the srRNA, is found in both Parnassius species (Fig. 7).
ED
MA NU
SC
RI P
T
ACCEPTED MANUSCRIPT
PT
Fig. 7. Characteristic sequences of AT-rich region of Parnassius apollo. (A) The special TA repeat sequences of the AT-rich region in Parnassius apollo and Parnassius bremeri. (B) Squence of Parnassius apollo AT-rich region. The shadow areas show the ATAGA motif, poly-T stretch, ATTTA sequence and microsatellite TA repeat sequence. The underlined sequences show the tRNATrp-like sequence and the tRNALeu(UUR)-like sequence. (C) Secondary structures of the tRNATrp-like sequence and the tRNALeu(UUR)-like sequence.
AC
CE
It has been previously demonstrated that the presence of tRNA-like sequences within the AT-rich region in mammalian mitogenome is due to the failure to cleave the tRNA primers from the nascent DNA strand after the mitochondrial DNA synthesis, and consequently tRNA-like sequences are incorporated into the mitogenome (Cantatore et al., 1987). Afterwards, the tRNA-like sequences has also been reported in many insects, such as Hymenoptera (Cha et al., 2007), Diptera (Cameron et al., 2007), Lepidoptera (Kim et al., 2009), and Coleoptera (Hong et al., 2009). In P. apollo, one tRNATrp-like and one tRNALeu(UUR)-like sequences are detected in its AT-rich region, as is the case in its congeneric species P. bremeri (Fig. 7).
Acknowlegements:
This work was supported by the National Science Fundation of China
(Grant No. 41172004) and the Opening Funds from the State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology, Chinese Academy of Sciences (Grant No.104143).
References Benson, G., 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580.
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
MA NU
SC
RI P
T
Boore, J.L., 1999. Animal mitochondrial genomes. Nucleic Acids Res. 27, 1767–1780. Boore, J.L., Lavrov, D.V., Brown, W.M., 1998. Gene translocation links insects and crustaceans. Nature 392, 667–668. Cameron, S.L., 2014. Insect mitochondrial genomics: Implications for evolution and phylogeny. Annu. Rev. Entomol. 59, 95–117. Cameron, S.L., Lambkin, C.L., Barker, S.C., Whiting, M.F., 2007. A mitochondrial genome phylogeny of Diptera: whole genome sequence data accurately resolve relationships over broad timescales with high precision. Syst. Entomol. 32, 40–59. Cameron, S.L., Whiting, M.F., 2007. Mitochondrial genomic comparisons of the subterranean termites from the Genus Reticulitermes (Insecta: Isoptera: Rhinotermitidae). Genome 50, 188–202. Cameron, S.L., Whiting, M.F., 2008. The complete mitochondrial genome of the tobacco hornworm, Manduca sexta, (Insecta: Lepidoptera: Sphingidae), and an examination of mitochondrial gene variability within butterflies and moths. Gene 408, 112–123. Cantatore, P., Gadaleta, M.N., Roberti, M., Saccone, C., Wilson, A.C., 1987. Duplication and remodeling of tRNA genes during the evolutionary rearrangement of mitochondrial genomes. Nature 329, 853–855. Cao, Y.Q., Ma, C., Chen, J.Y., Yang, D.R., 2012. The complete mitochondrial genomes of two ghost moths, Thitarodes renzhiensis and Thitarodes yunnanensis: the ancestral gene arrangement in Lepidoptera. BMC Genomics 13, 276. Carter, D., 2000. Butterflies and Moths. Dorling Kindersley, London. Caterino, M.S., Sperling, F.A.H., 1999. Papilio phylogeny based on mitochondrial cytochrome oxidase I and II genes. Mol. Phylogenet. Evol. 11, 122–137. Cha, S.Y., Yoon, H.J., Lee, E.M., Yoon, M.H., Hwang, J.S., Jin, B.R., Han, Y.S., Kim, I., 2007. The complete nucleotide sequence and gene organization of the mitochondrial genome of the bumblebee, Bombus ignitus (Hymenoptera: Apidae). Gene 392, 206–220. Collins, N.M., Morris, M.G., 1985. Threatened swallowtail butterflies of the world. IUCN, Gland, Switzerland and Cambridge, UK. Crease, T., 1999. The complete sequence of the mitochondrial genome of Daphnia pulex (Cladocera: Crustacea). Gene 233, 89–99. Dong, Y., Zhu, L.X., Wu, Y.F., Wu, X.B., 2013. The complete mitochondrial genome of the Alpine black swallowtail, Papilio maackii (Insecta: Lepidoptera: Papilionidae). Mitochondrial DNA 24, 639–641. Gillespie, J.J., Johnston, J.S., Cannone, J.J., Gutell, R.R., 2006. Characteristics of the nuclear (18S, 5.8S, 28S and 5S) and mitochondrial (12S and 16S) Rrna genes of Apis mellifera (Insecta: Hymenoptera): structure, organization and retrotransposable elements. Insect Mol. Biol. 15, 657–686. Gimenez, D.M., 1996. Parnassius apollo. In: IUCN 2013. IUCN Red List of Threatened Species. Version 2013.1. . Assessed 1 August 1996. Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/ NT. Nucleic Acids Symp. Ser. 41, 95–98. Hao, J.S., Sun, M.E., Sun, X.Y., Shao, L.L., Yang, Q., 2013. Complete mitogenomes of Euploea mulciber (Nymphalidae: Danainae) and Libythea celtis (Nymphalidae: Libytheinae) and their phylogenetic implications. ISRN Genomics 491636, 1–14. Hong, M.Y., Jeong, H.C., Kim, M.J., Jeong, H.U., Lee, S.H., Kim, I., 2009. Complete mitogenome sequence of the jewel beetle, Chrysochroa fulgidissima (Coleoptera:
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
MA NU
SC
RI P
T
Buprestidae). Mitochondrial DNA 20, 46–60. Hu, J., Zhang, D.X., Hao, J.S., Huang, D.Y., Cameron, S., Zhu, C.D., 2010. The complete mitochondrial genome of the yellow coaster, Acraea issoria (Lepidoptera: Nymphalidae: Heliconiinae: Acraeini): sequence, gene organization and a unique tRNA translocation event. Mol. Biol. Rep. 37, 3431–3438. Ji, L.W., Hao, J.S., Wang, Y., Huang, D.Y., Zhao, J.L., Zhu, C.D., 2012. The complete mitochondrial genome of the dragon swallowtail, Sericinus montela Gray (Lepidoptera: Papilionidae) and its phylogenetic implication. Acta Entomol. Sin. 55, 91–100. Kim, M.I., Baek, J.Y., Kim, M.J., Jeong, H.C., Kim, K.G., Bae, C.H., Han, Y.S., Jin, B.R., Kim, I., 2009. Complete nucleotide sequence and organization of the mitogenome of the red–spotted apollo butterfly, Parnassius bremeri (Lepidoptera: Papilionidae) and comparison with other lepidopteran insects. Mol. Cells 28, 347–363. Lowe, T.M., Eddy, S.R., 1997. tRNAscan–SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964. Nardi, F., Carapelli, A., Dallai, R., Roderick, G.K., Frati, F., 2005. Population structure and colonization history of the olive fly Bactrocera oleae (Diptera: Tephritidae). Mol. Ecol. 14, 2729–2738. Nardi, F., Spinsanti, G., Boore, J.L., Carapelli, A., Dallai, R., Frati, F., et al., 2003. Hexapod origins: monophyletic or paraphyletic? Science 299, 1887–1889. Nelson, L.A., Lambkin, C.L., Batterham, P., Wallman, J.F., Dowton, M., Whiting, M.F., Yeates, D.K., Cameron, S.L., 2012. Beyond barcoding: A mitochondrial genomics approach to molecular phylogenetics and diagnostics of blowflies (Diptera: Calliphoridae). Gene 511, 131–142. Perna, N.T., Kocher, T.D., 1995. Patterns of nucleotide composition at fourfold degenerate sites of animal mitochondrial genome. J. Mol. Evol. 41, 353–358. Qin, F., Jiang, G.F., Zhou, S.Y., 2013. Complete mitochondrial genome of the Teinopalpus aureus guangxiensis (Lepidoptera: Papilionidae) and related phylogenetic analyses. Mitochondrial DNA 23, 123–125. Salvato, P., Simonato, M., Battisti, A., Negrisolo, E., 2008. The complete mitochondrial genome of the bag–shelter moth Ochrogaster lunifer (Lepidoptera, Notodontidae). BMC Genomics 9, 331. Simmons, R.B., Weller, S.J., 2001. Utility and evolution of cytochrome b in insects. Mol. Phylogenet. Evol. 20, 196–210. Simon, C., Buckley, T.R., Frati, F., Stewart, J.B., Beckenbach, A.T., 2006. Incorporating molecular evolution into phylogenetic analysis and a new compilation of conserved polymerase chain reaction primers for animal mitochondrial DNA. Ann. Rev. Ecol. Evol. Syst. 37, 545–579. Singh, V.K., Mangalam, A.K., Dwivedi, S., Naik, S., 1998. Primer premier: program for design of degenerate primers from a protein sequence. BioTechniques 24, 318–319. Still, J., 1996. Butterflies and Moths of Britain and Europe. Harper Collins, London. Sun, Q.Q., Sun, X.Y., Wang, X.C., Gai, Y.H., Hu, J., Zhu, C.D., Hao, J.S., 2012. Complete sequence of the mitochondrial genome of the Japanese buff–tip moth, Phalera flavescens (Lepidoptera: Notodontidae). Genet. Mol. Res. 11, 4213–4225. Taanman, J.W., 1999. The mitochondrial genome: structure, transcription, translation and replication. Biochim. Biophys. Acta 1410, 103–123. Tamura, K., Peterson, D., Peterson, N., Stecher, G., et al., 2011. MEGA5: molecular evolutionary
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
MA NU
SC
RI P
T
genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739. Taylor, M. F., McKechnie, S.W., Pierce, N., Kreitman, M., 1993. The lepidopteran mitochondrial control region: structure and evolution. Mol. Biol. Evol. 10, 1259–1272. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., 1997. The CLUSTAL_X windows interface: Flexible strategies for multiple sequences alignment aided by quality analysis tools. Nucleic Acids Res. 24, 4876–4882. Wang, X.C., Sun, X.Y., Sun, Q.Q., Zhang, D.X., Hu, J., Yang, Q., Hao, J.S., 2011. The complete mitochondrial genome of the laced fritillary Argyreus hyperbius (Lepidoptera: Nymphalidae). Zool. Res. 32, 465–475. Wolstenholme, D.R., 1992. Animal mitochondrial DNA: structure and evolution. Int. Rev. Cytol. 141, 173–216. Wu, L.W., Lees, D.C., Yen, S.H., Hsu, Y.F., 2010. The Complete mitochondrial genome of the near–threatened swallowtail, Agehana maraho (Lepidoptera: Papilionidae): Evaluating sequence variability and suitable markers for conservation genetic studies, Entomol. News 121, 267–280. Zhao, F., Huang, D.Y., Shi, Q.H., Hao, J.S., Sun, X.Y., Zhang, L.L., Yang, Q., 2013. The first mitochondrial genome for the butterfly family Riodinidae and its systematic implications. Zool. Res. 34: 109–119.