Accepted Manuscript Molecular evolution and expression of the CRAL_TRIO protein family in insects Gilbert Smith, Adriana D. Briscoe PII:
S0965-1748(15)00026-0
DOI:
10.1016/j.ibmb.2015.02.003
Reference:
IB 2669
To appear in:
Insect Biochemistry and Molecular Biology
Received Date: 1 December 2014 Revised Date:
29 January 2015
Accepted Date: 3 February 2015
Please cite this article as: Smith, G., Briscoe, A.D., Molecular evolution and expression of the CRAL_TRIO protein family in insects, Insect Biochemistry and Molecular Biology (2015), doi: 10.1016/ j.ibmb.2015.02.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
908 930 4 004 0062979 XP x 2 Bm s Mse S201 97 2 M OG 317 094 4 8 DP P Dp Ag X FA0 378 5 E 2 Tc XP 6 2327 39 7 0 Am NP 310 9076 0 4 Dm g XP 493 013 19 A 00 S2 91 60 0 8 3 XP OG P 6 59 82 5 4 Bm DP m N P 62 Dp D g X XP A Am
Bm XP 004922800 Bm X P 0 0 4 9 2 2 7 6 2 Ms Msex005131a Ms Msex00 5131b Bm XP Dp DP 004922763 Dp DP OGS212706 O G Ms M S212 Bm sex005 705 Bm XP 004 127 Ms XP 004 922758 M Dp sex 9227 Dp DPO 0051 57 Ms DPO GS2 24 Bm Mse GS2 12755 Tc X x0 127 A XP P 00 1195 54 D m X 96 492 9 Ag m N P 0 7589 7864 M X P 06 B s P 65 565 D m Ms 31 188 388 Tc p B XP ex0 802 2 XP F0 00 019 7 1 4 00 01 928 52 18 6B 97 10 2F 1 29 09 0
ACCEPTED MANUSCRIPT Ms Manduca sexta Dp Danaus plexippus Bm Bombyx mori
64 02 09 1 31 18 65 93 7 02 06 P 9 X 1 2 29 6 g P 3 S A c X XP OG 059 883 4 T g DP ex0 492 63 A p s 0 29 D s M P 0 049 860 M m X P 0 000 B m X sex 311 B s M J75 460 M EH 969 77 Dp XP 3120 13 Tc XP 5723 61 Ag NP 0008 Dm Msex 00862 5 Ms Msex0 92963 Ms XP 004 2 Bm HJ7531 2928 Dp E P 00492 6 Bm X P 00492963 Bm X J70837 Dp EH 639 29 49 00 Bm XP 63 Ms Msex0008 Tc XP 966309 Tc XP 008197657 Tc XP 971470 Tc XP 971527 Tc XP 970129 Dp DPOG Bm XP S206990 Ms Mse 004928861 Bm X x010502 Dp D P 00493 Ms POGS2 3930 Bm Msex0 06941 Bm XP 00 10499 Bm XP 0 49313 Ms XP 0493 68 Bm Ms 0049 1367 Ms XP ex004 31422 51 M M 00 M s M sex 4928 9 B s M sex 010 857 Ms m X sex 010 488 D M P 0 012 487 s 0 D p 1 D p EH ex 49 01 D p D EH J79 010 2905 p P J7 2 49 EH O 9 91 2 4 J7 GS 292 Lepidopteran 92 20 89 69 expansion 88
SC
M AN U TE D EP AC C
7 2 49 05 10 29 x0 49 498 se 0 0 96 63 M 0 01 4 s XP x 10 88 M m Mse ex0 492 89 s 0 04 B s M s M P 0 01 495 M m X sex 010 B s M sex M sM M
Bootstrap support >80%
Insect CRAL_TRIO domain proteins
Dp DP Dp D O Bm DP p GS OG EH 214 X J Dp Ms MP 00 S2 72 230 9 Dp DP se 492 142 49 D OG x0 9 29 Ms POG S2 038 378 Bm M S 14 04 XP sex 214 232 0 2 Dm 0049 038 33 0 Ag ABV 2924 5 8 8 X Ag P 32 2392 Dm XP 31 0369 8 NP Dm 6 052 NP 1051 Tc X 72481 0 Dp E P 97310 3 HJ 6 Dp E 78510 Bm XP HJ78511 0049 Ms com 29247 p5 Bm XP 0049 6142 29246 Ms scaffold00053 Bm XP 004929244 Tc XP 968368 Dm NP 730931 Ag XP 003436154 5961 Ms Msex00 929 4928 Bm XP 00HJ74795 E 8 Dp 56332 6 P 006 Am X EHJ790832 Dp P 9732 8 X Tc 031 P 97 355 Tc X 01040 5959 0 0 0 P e N 084 Bm Ms Ms HJ79 085 E 9 0 Dp EHJ7 0596 1 3 Dp sex0 289 9 M 4 Ms P 00 X m B
Clathrin box
T Dp c X Bm D P P 0 XP OG 08 M Bm s S 1 M 00 2 93 XP se 49 013 74 x0 30 4 1 Dp Ms 0 0 M 4 0 90 5 Tc DPO sex 93 629 6 Am XP GS 014 091 5 1 0 2 Dm XP 0 081 013 871 93 4 0 N Ms P 00 6566 2177 4 1 M Bm se 260 55 Dp XP 00 x0107 581 DPO 49 0 GS 2562 2 Ms 2 5 M sex 15399 Bm 01 X Dp D P 00492 4334 POG 5626 S Dp D 2000 PO 2 Bm XP GS21539 2 8 00 Ms Mse 4925639 x01070 Tc XP 9723 1 45 Ag XP 317327 Bm XP 004931769 Dp DPOGS207391 Ms Msex002032 53 Bm XP 004922581 23 Ms Msex0163497 13 S2 Dp DPOG 87951 0016 Ag XP P 648579 Dm N 300003 PSCFx018304 D p D se 557 Ms M XP 315 939 Ag P 523 155 N 6 Dm GS20 9842 DPO 492 326 Dp XP 00 x010 312 0 e Bm s Ms 0325 49212 M P0 97 8 X XP 209 42 Am Tc XP 3 6510 001 0 8 Ag NP F30 892 2 6 A C 2 PD PS 049 059 250 9 Dm p D P 0 ex0 931 703 08 D X s 4 0 3 PINTA 0 M 4 Bm Ms P 0 GS2 00 X O ex P s Bm D M p s D M
RI PT
Tc Tribolium castaneum Am Apis mellifera Ag Anopheles gambiae Dm Drosophila melanogaster
ACCEPTED MANUSCRIPT
1
Molecular evolution and expression of the CRAL_TRIO protein family in insects
2 3
Gilbert Smith1* and Adriana D. Briscoe1
4
1
5
*Corresponding author
RI PT
Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697
6 7
SC
8 9
M AN U
10 11 12 13 14
18 19 20 21 22 23 24 25
EP
17
AC C
16
TE D
15
Keywords: Alpha-tocopherol transfer protein; retinoid-binding protein; CRAL_TRIO; SEC14; gene duplication
26 27
1
ACCEPTED MANUSCRIPT
1
Abstract CRAL_TRIO domain proteins are known to bind small lipophilic molecules such as
3
retinal, inositol and Vitamin E and include such gene family members as PINTA, α-tocopherol
4
transfer (ATT) proteins, retinoid binding proteins, and clavesins. In insects, very little is known
5
about either the molecular evolution of this family of proteins or their ligand specificity. Here we
6
characterize insect CRAL_TRIO domain proteins and present the first insect CRAL_TRIO
7
protein phylogeny constructed by performing reciprocal BLAST searches of the reference
8
genomes of Drosophila melanogaster, Anopheles gambiae, Apis mellifera, Tribolium castaneum,
9
Bombyx mori, Manduca sexta and Danaus plexippus. We find several highly conserved amino
10
acid residues in the CRAL_TRIO domain-containing genes across insects, and a gene expansion
11
resulting in more than twice as many gene family members in lepidopterans than other surveyed
12
insect species, but no lepidopteran homolog of the PINTA gene in Drosophila. In addition, we
13
examined the expression pattern of CRAL_TRIO domain genes in M. sexta heads using RNA-
14
Seq data. Of the 42 gene family members found in the M. sexta reference genome, we found 30
15
expressed in the head tissue with similar expression profiles between males and females. Our
16
results suggest this gene family underwent a large expansion in Lepidoptera, making the
17
leptidopteran CRAL_TRIO domain family distinct from other holometabolous insect lineages.
SC
M AN U
TE D
EP
AC C
18
RI PT
2
2
ACCEPTED MANUSCRIPT
1
1. Introduction The CRAL_TRIO domain is an N-terminal ligand binding region within a larger protein
3
domain (SEC14) that takes its name from common sequence elements in cellular retinaldehyde
4
binding protein (CRALBP) and the signaling domain of the TRIO guanine exchange factor
5
(Panagabko et al., 2003). The domain was first characterized three-dimensionally from structures
6
of yeast phosphatidylinositol transfer protein SEC14p (Sha et al., 1998) and human supernatant
7
protein factor (SPF; Stocker et al., 2002) but is present in most organisms and is likely to be of
8
ancient evolutionary origin (Saito et al., 2007). The domain structure is comprised of alternating
9
α-helices and β-strands (Fig. 1) that bind small lipophilic molecules such as retinal, inositol,
10
squalene, and Vitamin E, the latter of which is composed of α-, β-, γ-, and δ-tocopherols
11
(Panagabko et al., 2003). CRAL_TRIO domains are found alongside many other domains in
12
proteins, including the Golgi dynamics (GOLD), protein tyrosine phosphatase (PTP), RasGAP,
13
RhoGAP, RhoGEF, pleckstrin homology (PH), macro, spectrin repeats (SPEC), Src homology
14
(SH3) and protein kinase domains (Gupta et al.,2012; Saito et al., 2007).
TE D
M AN U
SC
RI PT
2
Relatively few members of the large CRAL_TRIO gene family have been studied for
16
ligand specificity. Those that have been studied include human α-tocopherol transfer protein (α-
17
TTP), SPF (which binds squalene), cellular retinaldehyde binding protein (CRALBP, which
18
binds retinal) and Saccharomyces cerevisiae phosphatidylinositol transfer protein (SEC14p,
19
which binds phosphatidylinositol and phosphatidylcholine). Ligand specificity of these related
20
proteins has been compared in one study that sought to explain why mammals are able to
21
selectively reuptake α-tocopherol regardless of the mixture of dietary tocopherols ingested
22
(Panagabko et al., 2003). In humans, mutations in α-TTP can result in a neurodegenerative
23
disease called ataxia with isolated vitamin E deficiency (AVED) that causes a variety of
AC C
EP
15
3
ACCEPTED MANUSCRIPT
neurological symptoms (Ouahchi et al., 1995). While, of the four proteins compared, α-TTP
2
exhibited the highest binding affinity for α-tocopherol, promiscuity in ligand binding was
3
observed in the other three proteins, suggesting that caution should be exercised when attempting
4
to infer function of these proteins from single ligand studies. The SEC14 domain contains a large
5
flexible lipid binding hydrophobic pocket that is closed off by the N-terminal domain, and
6
docking studies suggest different lipids may bind SEC14 in several different ways (Saito et al.,
7
2007). Few studies have identified key residues that interact with particular ligands. However,
8
structural and mutational analysis of yeast SEC14 suggests that residues E207 and K239 are
9
critical for phosphatidylinositol head group binding (Sha et al., 1998), and AVED causing
10
mutations in human α-TTP are located in the binding pocket (Bromley et al., 2013; Min et al.,
11
2003).
M AN U
SC
RI PT
1
Besides α-tocopherol binding and transport by α-TTP, and phosphatidylinositol/
13
phosphatidylcholine exchange and transfer observed in SEC14p, it has been proposed that other
14
gene family members are involved in lipid-mediated regulatory functions in organelles and in
15
intracellular traffic (Saito et al., 2007). For example, members of the clavesin (clathrin vesicle-
16
associated SEC14 proteins; Katoh et al., 2009) gene family are found abundantly in mammalian
17
brain tissue on clathrin-coated vesicles that originate from the trans-Golgi network and form the
18
major protein transport pathway from the secretory system to endosomes/lysosomes (Katoh et
19
al., 2009). Experimental work by Katoh et al. (2009) indicates that clavesins specifically function
20
in neurons in a transport pathway between early endosomes and mature lysosomes and that one
21
of the molecules they bind is phosphatidylinositol 3,5-bisphosphate (PtdIns(3,5)P2). Mutations in
22
mice that deplete PtdIns(3,5)P2 produce neurodegeneration phenotypes with little effect on other
23
tissues (Zhang et al., 2007).
AC C
EP
TE D
12
4
ACCEPTED MANUSCRIPT
In insects, the function of only one member of the CRAL_TRIO domain protein, PINTA,
2
has been determined. PINTA or prolonged depolarization afterpotential (PDA) is not apparent
3
was discovered in a screen for Drosophila eye mutants that are deficient in rhodopsin biogenesis
4
(Wang and Montell, 2005). Visual pigments or rhodopsins are composed of an opsin protein
5
covalently linked to derivatives of Vitamin A (all-trans-retinol), such as 11-cis-retinal. Although
6
the cells that compose the vertebrate and invertebrate eye differ significantly, the function of
7
PINTA is somewhat reminiscent of human CRALBP, which is involved in the transport of the
8
visual pigment chromophore, 11-cis-retinal, in the photoreceptors of the human eye (Crabb et al.,
9
1988). Humans that carry mutations in the CRALBP gene may develop retinitis pigmentosa or
10
retinitis punctata albescence (Fishman et al., 2004). In Drosophila, PINTA is localized to the
11
retinal pigment cells, where all-trans 3-hydroxyretinol is converted to 11-cis 3-hydroxyretinal
12
(Pak et al., 2012; Wang and Montell, 2005). Ligand-binding assays indicate that PINTA
13
preferentially binds all-trans-retinol (Wang and Montell, 2005). It has been proposed that PINTA
14
either sequesters the 3-hydroxy version of all-trans-retinol in retinal pigment cells, generating a
15
concentration gradient that promotes uptake of Vitamin A in these cells, and/or that PINTA
16
facilitates the presentation of all-trans-retinol to proteins that participate in the oxidation of all-
17
trans-retinol to all-trans-retinal (Pak et al., 2012; Wang and Montell, 2005).
SC
M AN U
TE D
EP
In the course of annotating vision-related proteins of the hawkmoth (Manduca sexta)
AC C
18
RI PT
1
19
genome, we discovered that lepidopteran genomes appear to contain a large number of
20
CRAL_TRIO domain proteins. While many of these gene family members are given names in
21
GenBank such as 'α-tocopherol transfer (ATT) protein', 'clavesin' or 'retinaldehyde binding
22
protein', in fact nothing is known experimentally about the function of any of these proteins in
23
insects, except for PINTA. To help begin to bridge this gap in our knowledge, here we
5
ACCEPTED MANUSCRIPT
characterize for the first time the CRAL_TRIO domain in insects and its molecular phylogeny.
2
Specifically, we determine phylogenetic relationships between gene family members from
3
insects with reference genomes, including the newly sequenced genome of M. sexta. We find a
4
number of conserved sites within the SEC14 domain. Further, we see large variation in the size
5
of the gene family, with lepidopterans (butterflies and moths) having the highest number of gene
6
family members among the species examined. We also make use of RNA-Seq data to examine
7
the expression of gene family members in M. sexta.
8
2. Materials and methods
M AN U
SC
RI PT
1
We searched the predicted proteins of the reference genomes of M. sexta and Danaus
10
plexippus for all homologs of the CRAL_TRIO domain family using tblastn and query sequences
11
from Drosophila melanogaster, Anopheles gambiae, Apis mellifera, Tribolium castaneum and
12
Bombyx mori homologs, as well as reciprocal tblastn searches of the nr/nt or RefSeq RNA
13
databases (top hits with e-values of < 1 × 10-10). We then performed tblastn searches of a de novo
14
assembly of the M. sexta transcriptome (ArrayExpress accession number E-MTAB-2066; Smith
15
et al., 2014) to find mRNA contigs that were identical to the predicted peptides. Comparison of
16
predicted peptide sequences from the reference genome against the predicted peptide mRNA
17
sequences permitted us to identify and correct errors in the predicted protein sequences. Isoforms
18
found in GenBank were excluded from further analysis, with the first annotated transcript
19
selected for analysis. The identity of the CRAL_TRIO domain-containing proteins was
20
confirmed through a comparison of their SEC14 domains to the Pfam database of HMM-based
21
gene family models using HMMER (Finn et al., 2011). Amino acid sequences were aligned
22
using MUSCLE with default settings. Neighbor-joining analysis of 455 amino acid sites was
AC C
EP
TE D
9
6
ACCEPTED MANUSCRIPT
1
performed using the Poisson model and pairwise deletion in MEGA (Tamura et al., 2011).
2
Robustness of the phylogeny was tested using 500 bootstrap replicates.
3 2.1. 3D modeling and sequence logo of the insect CRAL_TRIO/SEC14 domain
RI PT
4
The three-dimensional protein structure of the SEC14 domain (containing the
6
CRAL_TRIO domain) was predicted using I-TASSER (Roy et al., 2010), which uses both
7
multiple threading alignments and ab initio modeling with further refinement to obtain the most
8
likely 3D structure. Several shorter sequences were omitted from this analysis in order to
9
minimize gaps in the alignment. Thus the 3D structure was predicted from the consensus
10
sequence of 159 insect SEC14 domains, producing a top model with a confidence score (C-
11
score) of 0.87. The three dimensional structure was visualized in Jmol (Jmol Team, 2002) and
12
secondary structures were predicted using Jalview (Waterhouse et al., 2009) from the alignment.
13
A sequence logo was created in WebLogo (Crooks et al., 2004) to examine conservation of the
14
SEC14 domain across insects. Highly conserved amino acid positions were determined as those
15
with a bit score of > 2.5 with a frequency of > 70% at a particular position.
16
2.2. RNA-Seq Analysis
EP
TE D
M AN U
SC
5
Bar plots were made using the R package ggplot2 (Wickham, 2009) to visualize the
18
expression levels of members of the CRAL_TRIO domain gene family in M. sexta. This data
19
was obtained from Smith et al. (2014) and consisted of mRNA extracted from the heads of four
20
male and four female M. sexta adults sequenced on the Illumina platform (50 bp single-end
21
reads), followed by de novo transcriptome assembly using Trinity (Grabherr et al., 2011) and
22
RSEM (Li and Dewey, 2011) to assemble contigs that represent mRNA transcripts and obtain
23
expression levels. Raw count levels were normalized by transcript length using the FPKM
AC C
17
7
ACCEPTED MANUSCRIPT
(Fragments per kilobase of exon per million fragments mapped). FPKM was further normalized
2
between libraries in the R package NOIseq (Tarazona et al., 2011), using the trimmed mean of
3
M-values (TMM) normalization method. Differences in gene expression between males and
4
females were tested using ANOVA, and p-values were corrected for multiple tests using a False
5
Discovery Rate (FDR) calculated in the R package qvalue (Dabney and Storey, 2014). Genes
6
were defined as significantly differentially expressed when the FDR was < 0.05.
7
3. Results and discussion
SC
RI PT
1
Our GenBank searches yielded 43 predicted proteins in the CRAL_TRIO domain family
9
from B. mori, 12 from D. melanogaster, 14 from A. gambiae and 18 from T. castaneum. We also
10
found 6 from A. mellifera, although this number may have been underestimated due to the poor
11
initial annotation of this genome (Elsik et al., 2014). Our searches of whole-genome sequences
12
yielded 42 predicted CRAL_TRIO domain proteins in the Manduca sexta genome (Table S1)
13
and 38 in the Danaus plexippus genome. A variety of isoforms were also evident in each species.
14
A comparison of the blast-identified CRAL_TRIO proteins to existing Pfam HMM protein
15
family models showed the top hit for each to be the CRAL_TRIO domain family (Pfam
16
accession PF00650). All but four proteins had top hit e-values of < 1 × 10-10, the remaining four
17
having e-values of < 1 × 10-5. M. sexta e-values of matches to the CRAL_TRIO domain family
18
are shown in Table S1.
TE D
EP
AC C
19
M AN U
8
The insect SEC14 consensus protein structure prediction included four β-strands (Fig. 1A
20
and B), compared to six in the yeast phosphatidylinositol transfer protein (Sha et al., 1998) and
21
five in the human supernatant protein factor (Stocker et al., 2002). The overall insect SEC14
22
structure resembled that of human and yeast SEC14 domain structures, which includes a deep
23
hydrophobic pocket with the β-strands making up the floor of the pocket. Sequence logo analysis 8
ACCEPTED MANUSCRIPT
of the SEC14 domain, in which the CRAL_TRIO_N domain is embedded (Fig. 1C), indicates
2
the presence of 18 highly conserved amino acid residues 12 of which are conserved compared to
3
yeast SEC14p and 6 of which appear to be conserved across insects (Fig. 1C). The majority of
4
these conserved residues (12 out of 18) were associated with secondary structures (α-helices or
5
β-strands). Gupta et al. (2012) surveyed CRAL_TRIO domains across a broad range of species,
6
including yeast, plants, fish and humans, yet only one residue (position D178 in yeast SEC14p)
7
is conserved in both their study and here, providing further evidence that the six conserved
8
residues found here could be insect specific.
SC
RI PT
1
Phylogenetic analysis of the gene family (excluding isoforms) suggests numerous
10
lineage-specific gene duplications in insects but especially in lepidopterans (Fig. 2). Many gene
11
duplicates are in adjacent positions on the same genomic scaffold suggesting they are tandem
12
arrays of gene duplicates (e.g., M. sexta genes Msex005959-Msex005962). In particular, nine
13
genes from the lepidopteran expansion are tandemly arrayed on a single scaffold in M. sexta
14
(Msex010487-Msex010499), of which four were expressed in the head tissues (Table S1).
15
Similar patterns were seen for B. mori and D. plexippus genes within the lepidopteran expansion,
16
and duplication patterns suggest independent expansion events in each species (Fig. 2). This
17
suggests an important role for gene duplication during the evolution of lepidopteran
18
CRAL_TRIO proteins. Orthologs of the PINTA gene in Drosophila were not found in any of the
19
three lepidopteran genomes searched. Additionally, we found evidence of mRNA expression in
20
M. sexta heads for 30 CRAL_TRIO domain genes (Fig. 3). The majority of these genes had
21
appreciable levels of expression and one in particular, comp17579_c0 (Msex0051227; Table
22
S1), was expressed at a very high level. Few differences in the expression of CRAL_TRIO
23
domain
AC C
EP
TE D
M AN U
9
genes
were
seen
between
males
9
and
females.
Only
a
single
gene
ACCEPTED MANUSCRIPT
1
(comp20575_c0/Msex014871) showed significant differential expression after FDR correction,
2
having a higher expression level in males (Fig. 3). Clavesins, which are predominantly associated with clathrin-coated vesicles in
4
mammalian brain tissue, contain a clathrin box motif, LLALD, which is putatively involved in
5
clathrin binding (Katoh et al. 2009). We found this motif, which is the result of a four amino acid
6
insertion in the C-terminal domain, to be conserved in three orthologous lepidopteran proteins:
7
M. sexta Msex016323, B. mori XP_004922581 and D. plexippus DPOGS213497 (Fig. 2).
8
Msex016323 (comp35578_c0) is also one of the 30 CRAL_TRIO domain genes expressed in
9
heads, though at relatively low levels compared to other CRAL_TRIO genes (Fig. 3). We did not
10
find this insertion and precise motif in any of the lepidopteran proteins found in GenBank named
11
clavesin-1-like or clavesin-2-like except for B. mori XP_004929842 and Msex010326
12
(comp20338_c0), which contained a related motif, VLALD/N, in a similar position in the
13
protein. Unlike comp35578_c0 (Msex016323), comp20338_c0 is expressed at appreciable levels
14
in M. sexa heads.
TE D
M AN U
SC
RI PT
3
It is notable that we did not find an ortholog of the PINTA gene in lepidopterans (Fig. 2).
16
Currently, the identities of the retinoid binding proteins (RBPs) required for transformation of
17
vitamin A to the chromophore remain unclear in most insects. In Drosophila, all-trans 3-
18
hydroxyretinal can be converted into 11-cis 3-hydroxyretinal by light but it is also the case that
19
11-cis 3-hydroxyretinal can be made from dietary Vitamin A. The fly chromophore does not
20
rapidly dissociate from the opsin after light excitation, whereas in butterflies it can (Bernard,
21
1983a; 1983b; Stavenga and Hardie, 2011). This suggests that in lepidopteran species,
22
production of the chromophore may be even more dependent on enzymatic processes than in
23
Drosophila. It is therefore possible that some of the newly-identified CRAL_TRIO domain
AC C
EP
15
10
ACCEPTED MANUSCRIPT
1
proteins identified here have taken on that role as retinoid-binding proteins involved in the visual
2
cycle. We surveyed insect proteins containing the lipid binding CRAL_TRIO domain, including
4
the newly sequenced M. sexta genome. We found that the CRAL_TRIO domain gene family has
5
undergone a considerable expansion in lepidopteran species and that a number of amino acid
6
residues within the wider SEC14 domain appear to demonstrate insect-specific conservation.
7
Further, the majority of these genes are expressed in the head tissue in M. sexta. Our results
8
suggest that the evolution of CRAL_TRIO domain-containing proteins might play an important
9
role in insect evolution and in particular in the evolution of lepidopteran species, which is
SC
M AN U
10
RI PT
3
potentially linked to the evolution of visual systems.
11 Acknowledgements
13
We would like to thank two anonymous reviewers for their constructive comments. This project
14
was funded by a National Science Foundation (NSF) grant: IOS-1257627.
EP AC C
15
TE D
12
11
ACCEPTED MANUSCRIPT
References
2
Bernard, G.D., 1983a. Bleaching of rhabdoms in eyes of intact butterflies. Science. 219, 69-71.
3
Bernard, G.D., 1983b. Dark-processes following photoconversion of butterfly rhodopsins.
4
RI PT
1
Biophys. Struct. and Mech. 9, 277-286.
Bromley, D., Anderson, P.C., Daggett, V., 2013. Structural consequences of mutations to the α-
6
tocopherol transfer protein associated with the neurodegenerative disease ataxia with
7
vitamin E deficiency. Biochemistry 52, 4264-4273.
SC
5
Crabb, J.W., Goldflam, S., Harris, S.E., Saari, J.C., 1988. Cloning of the cDNAs encoding the
9
cellular retinaldehyde-binding protein from bovine and human retina and comparison of
11 12 13 14
the protein structures. J. Biol. Chem. 263, 18688-18692.
Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E., 2004. WebLogo: a sequence logo generator. Genome Res. 14, 1188-1190.
Dabney, A., Storey, J.D., 2014. qvalue: Q-value estimation for false discovery rate control. R
TE D
10
M AN U
8
package version 1.36.0.
Elsik, C.G., Worley, K.C., Bennett, A.K., Beye, M., Camara, F., Childers, C.P., de Graaf, D.C.,
16
Debyser, G., Deng, J.X., Devreese, B., Elhaik, E., Evans, J.D., Foster, L.J., Graur, D.,
17
Guigo, R., Hoff, K.J., Holder, M.E., Hudson, M.E., Hunt, G.J., Jiang, H.Y., Joshi, V.,
19 20
AC C
18
EP
15
Khetani, R.S., Kosarev, P., Kovar, C.L., Ma, J., Maleszka, R., Moritz, R.F.A., MunozTorres, M.C., Murphy, T.D., Muzny, D.M., Newsham, I.F., Reese, J.T., Robertson, H.M., Robinson, G.E., Rueppell, O., Solovyev, V., Stanke, M., Stolle, E., Tsuruda, J.M., Van
21
Vaerenbergh, M., Waterhouse, R.M., Weaver, D.B., Whitfield, C.W., Wu, Y.Q.,
22
Zdobnov, E.M., Zhang, L., Zhu, D.H., Gibbs, R.A., HGSC Production Teams, Honey
12
ACCEPTED MANUSCRIPT
1
Bee Genome Sequencing Consortium, 2014. Finding the missing honey bee genes:
2
lessons learned from a genome upgrade. BMC Genomics 15, 86.
4
Finn, R. D., Clements, J., Eddy S. R., 2011. HMMER web server: interactive sequence similarity
RI PT
3
searching. Nucleic Acids Res. 39(Web Server issue): W29-W37.
Fishman, G.A., Roberts, M.F., Derlacki, D.J., Gimsby, J.L., Yamamoto, H., Sharon, D.,
6
Nishiguchi, K.M., Dryja, T.P., 2004. Novel mutations in the cellular retinaldehyde-
7
binding protein gene (RLBP1) associated with retinitis punctata albescens: evidence of
8
interfamilial genetic heterogeneity and fundus changes in heterozygotes. Arch.
9
Ophthalmol. 122, 70-75.
M AN U
SC
5
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X.,
11
Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A.,
12
Rhind, N., di Palma, F., Birren, B. W., Nusbaum, C., Lindblad-Toh, K., Friedman, N.,
13
Regev, A., 2011. Full-length transcriptome assembly from RNA-Seq data without a
14
reference genome. Nat. Biotechnol. 29, 644-652.
17 18 19 20 21
http://www.jmol.org/.
EP
16
Jmol Team, 2002. Jmol: an open-source Java viewer for chemical structures in 3D.
Katoh, Y., Ritter, B., Gaffry, T., Blondeau, F., Hönig, S., McPherson, P.S., 2009. The clavesin
AC C
15
TE D
10
family, neuron-specific lipid- and clathrin-binding Sec14 proteins regulating lysosomal morphology. J. Biol. Chem. 284, 276464-27654.
Li, B., Dewey, C.N., 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 12, 323.
13
ACCEPTED MANUSCRIPT
1
Min, K.C., Kovall, R.A., Hendrickson, W.A., 2003. Crystal structure, of human alpha-tocopherol
2
transfer protein bound to its ligand: Implications for ataxia with vitamin E deficiency. P.
3
Natl. Acad. Sci. USA 100, 14713-14718. Ouahchi, K., Arita, M., Kayden, H., Hentati, F., Ben Hamida, M., Sokol, R., Arai, H., Inoue, K.,
5
Mandel, J.L., Koenig, M., 1995. Ataxia with isolated vitamin E deficiency is caused by
6
mutations in the alpha-tocopherol transfer protein. Nat. Genet. 9, 141-145.
RI PT
4
Pak, W.L., Shino, S., Leung, H.T., 2012. PDA (prolonged depolarization afterpotential)-
8
defective mutants: the story of nina's and ina's--pinta and santa maria, too. J.
9
Neurogenet. 26, 216-237.
M AN U
SC
7
10
Panagabko, C., Morley, S., Hernandez, M., Cassolato, P., Gordon, H., Parsons, R., Manor, D.,
11
Atkinson, J., 2003. Ligand specificity in the CRAL-TRIO protein family. Biochemistry
12
42, 6467-6474.
16 17 18 19 20 21
TE D
15
structure and function prediction. Nat. Protoc. 5, 725-738. Saito, K., Tautz, L., Mustelin, T., 2007. The lipid-binding SEC14 domain. Biochim. Biophys. Acta. 1171, 719-726.
EP
14
Roy, A., Kucukural, A., Zhang, Y., 2010. I-TASSER: a unified platform for automated protein
Sha, B., Phillips, S.E., Bankaitis, V.A., Luo, M., 1998. Crystal structure of the Saccharomyces cervisiae phosphatidylinositol-transfer protein. Nature. 391: 506-510.
AC C
13
Smith, G., Chen, Y.R., Blissard, G.W., Briscoe, A.D., 2014. Complete dosage compensation and sex-biased gene expression in the moth Manduca sexta. Genome Biol. Evol. 6, 526-537.
Stavenga, D.G., Hardie, R.C., 2011. Metarhodopsin control by arrestin, light-filtering screening
22
pigments, and visual pigment turnover in invertebrate microvillar photoreceptors. J.
23
Comp. Physiol. A. 197, 227-241.
14
ACCEPTED MANUSCRIPT
1 2
Stocker, A., Tomizaki, T., Schulze-Briese, C., Baumann, U., 2002. Crystal structure of the human supernatant protein factor. Structure 10, 1533-1540. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5:
4
molecular evolutionary genetics analysis using maximum likelihood, evolutionary
5
distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739.
7
Tarazona, S., Garcia-Alcalde, F., Dopazo, J., Ferrer, A., Conesa, A., 2011. Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213-2223.
SC
6
RI PT
3
Voolstra, O., Oberhauser, V., Sumser, E., Meyer, N.E., Maguire, M.E., Huber, A., von Lintig, J.,
9
2010. NinaB is essential for Drosophila vision but induces retinal degeneration in opsin-
10 11 12
M AN U
8
deficient photoreceptors. J. Biol. Chem. 285, 2130-2139.
Wang, T., Montell, C., 2005. Rhodopsin formation in Drosophila is dependent on the PINTA retinoid-binding protein. J. Neuroscience. 25, 5187-5194.
Waterhouse, A.M., Procter, J.B., Martin, D.M.A., Clamp, M., Barton, G.J., 2009. Jalview
14
Version 2-a multiple sequence alignment editor and analysis workbench. Bioinformatics
15
25, 1189-1191.
TE D
13
Wickham, H., 2009. ggplot2: elegant graphics for data analysis. New York, Springer.
17
Zhang, Y., Zolov, S.N., Chow, C.Y., Slutsky, S.G., Richardson, S.C., Piper, R.C., Yang, B., Nau,
18
J.J., Westrick, R.J., Morrison, S.J., Meisler, M.H., Weisman, L.S., 2007. Loss of Vac14, a
20 21
AC C
19
EP
16
regulator of the signaling lipid phosphatidylinositol 3,5-bisphosphate, results in neurodegeneration in mice. Proc. Natl. Acad. Sci. USA 104, 17518-17523.
15
ACCEPTED MANUSCRIPT
Fig. 1. Protein structure and conservation of the SEC14 domain in insects. The three-dimensional protein structure was predicted using I-TASSER and visualized in Jmol from two different viewpoints (a & b) using the consensus sequence of 159 insect sequences (α-helices shown in pink, β-strands in yellow and 310 helices purple). The sequence logo (c) was created in WebLogo from 159 insect sequences and includes the approximate positions of α-helices (blue bars) and βstrands (red arrows) numbered from the N- to C-terminus, the majority of which are labeled in a & b. The height of each stack is proportional to the sequence conservation, measured in bits, and the height of each letter is proportional to the frequency of residues at that position. The position of the CRAL_TRIO_N domain is shown (dashed red line). Black arrows denote highly conserved amino acid positions with a bit score of > 2.5 and frequency of > 70% at that site. Numbers above arrows show the position of that amino acid within the yeast SEC14p protein (NCBI accession: NP_013796). Red stars indicate conserved sites between the insect consensus and yeast SEC14p sequences.
RI PT
2 3 4 5 6 7 8 9 10 11 12 13 14
Figure and table legends
SC
1
16 17 18
M AN U
15
Fig. 2. Phylogeny of the insect CRAL_TRIO domain proteins. The neighbor-joining tree was constructed in MEGA with 500 bootstrap replicates. Bootstrap support values of > 80% are shown as dots on branches.
19
27 28 29 30 31 32
TE D
EP
26
Fig. 3. Expression levels of members of the CRAL_TRIO domain gene family in the head tissue of Manduca sexta. Expression level is the mean normalized FPKM for female (red) and male (blue) RNA sequencing libraries. Error bars are the standard errors of mean female and male expression across four biological replicates of each sex. Contig IDs are from the Trinity assembly. Significant sex differences are denoted by asterisks, with significance defined as a FDR of < 0.05.
Table S1. Top BLAST hits of M. sexta genome annotations, and corresponding de novo assembled mRNA transcripts, to B. mori and D. plexippus. Table includes the BLAST hit description, e-value, percent identity, accession and the top-hit e-value from the HMMER search of the Pfam database.
AC C
20 21 22 23 24 25
33 34
16
A
B ACCEPTED MANUSCRIPT α7" α3"
α10"
α1"
α9"
α6" α9"
α3" α8"
N
α2"
β1" β2" β3" β4"
β2" β1"
RI PT
α1"
N
α5"
SC
α11"
43
58*
4
A
VDQ A
E
M D Q
G Q
K K
1 0 3 3
A
E
G Q
FI Y A
TD
AS
V
Q
Y
NL P
D
V
A
V
Q
V
LD
SR
E
E F M
E Q I
K
N
Q
S
T
M
M
A R
Q
K N
D
K
T
A
NA
L
S
P R
T
T
S
C
L
F
V
T
A
L
M
G G
RG
R L RK L R K L R APRK
TL
A A K I I
G
E
V
α3"
T
D
S
C
I
P
I
ER
ID
R
T M
A
A K
C
M
A K
L
V H
G
F
V
Q
E
S
V
P
K
W
D A
E
F LSN F
DW
P
S A K
E N
G
R
Y
S
Y
S
C
T
M
VV
I
R M
Q
P N R
H
TS
F Y
A
A K
H
V
C
T M
I
V
PS Y
P
T
A
AL
L
SA
N Y F H
E
Q
T C
R
M
N
V
V
I
F
R
L
A
LFF VA
LS
I
C
M
T
Y
FV
AY
H
S
S
T C
A
I
LV
G
M
T
T
C
I
C
K
QV
N A D A Y
I
T
C
α4"
S
E
R
I
F
F
S A
V KM L I KLH L MN
L
V
E K K N G
Q
V
Y
C
M
V
A I N M SY
A
E
A
S
N F H
T
N
Y EL A I
L
D
A
L
A L
E
Q
R
VV
L
T
H
C
P
T
AL
SA
E K K N G
K
D
YYT
S
D
P
L
Y
V E MA
C
Y
A
S
Q
Q
TKN R LC
F
I
N S
K
HR
LQT Y DS ER NPQ S I C
H
S
I
S VT R
A
H
I
AR
SSD
V E E
E K F T
N
I
H
T
A
M
TL
A A K I
N
TY K AK A
N
I
A
QG
F
S
M
F
LT
L R
W
D
EM
F
A
K
A
VL N F
I Y
C
M
F
P S
R
L D
NA
Q
P
H
L
F
D T
L
V
T Y
C H
N
R
Y
D
I
QV
N A D A Y
I
T
Q
S
R
E
H N
L
A
Q
E
N
V
Y
S H E V
S E G
N
V
E
K
H PA S N G C
Q
A A R
D
F
Q
N L C
D
Q
P N KK
S
K
Y
T
K
YYT
V
K
F V YSC
V E MA
W K
H
A
Q
QQ
H
TS
L
LQT
S
P N R
Y
N
V
C
V H
Q
S
TKN R LC
F
Y
I
QV
N A D A Y
I
T
Q
S
F
V
L R L KK GR GDV R K A Y FFSF K HRL ENL S Y I P IH D C M L YN S I P M M C
Y
S
N
T
E
L
L
A
Q
R
N
F
LT
I
M
VV
P D A
T
Y
D
G
H
Y
N
AN
V
I
P I R T F VE
L D T
V
R
C
M
H N
R
T M
A
A K
A K
V
K
YSC
V E MA
E
α2"
R M A
Q
I
W K
H
A
Q
S
Y N
T
N
T
I
I
V
S
A
PELFD I
FAL
E PELFFD PMELFFED L I P V E
Q
A
N C M D Q Y
M L
H
E
H
K
H
Q
R
I
K
A
AL
L
SA
N
E K K N G
Y
ED S
Y
V
Q
V
Q
K N
Y
K K
H
S
M
H
V
T
S
T
V
S
P R
T
A
L
K
N
KK V
E
G
R
I
T
LD
A
E
V
D
D D
L
N
W K
H
A
C
M
A
R
V
V
K
G
Q
E
S
I
V H
S
F
N
I F
Q
Q
E
P
L
L
V
F H
E
Q
T C
R
M
N
V
V
I
YYTD R K
H A P R KK H LP EAYPGGR K H LP EAYPGGR K H ID QE
T P
T
F
N
S
C
S
V
G PI
G
Q
I
V I
I I I
AN
F AY T I
I
I VT I KT P E F P E S Q TSS
M
V
G G
T
R
G
A
M
T
Y
S H E V
C H
S E G
N
D T
V
N
L
H
R
Y
DP
N L FM
L
A
T
W
EM
G
R
Y
Y
S
T
G
T
L L L L
T
V
T
D
M
A
N
F
VA I
C
M
T
FV
AY
Y
H
S
S
T C
A
I
LV
SG
M
E
Y
H
K
T
A
A
A
R
S Q
Q
A I N M SY
D
L
T
H
L
P
Y
S
S
V
PS
C
L
RTQ
Q
P
Q
C
H
A
S
F Q E
A
N
P
S
P
N S
H
C
F
P
Y
I
S
A
A
C
I
M
E
K
I
L
H
T
D
V E E
I
S VT R
A
H
I
V
VH S
F
H
A
V
N
S A K
E N
S
C
K D
SA
H
A
F
P
E K F T
N
I
H
I
T V
F
T NWK AK A
I
A
QG
F
T
F L
TQ
VS
K
V
T
A
VL N F
I Y
S
C
M
N V
G
S
S M
V
C
M
M
F
C
R C
Y
G
E
S
A
V
S
S
T I
D
V H
Q
V
V
A
Y
H
T
Q
S
I
S
F
G K VM
A F
D
E
M
T
AY
TG C E
F
Q
S
AM
C
FV
L GVI
V
P
LA
A
Q
S
MY
F
F
A
K
Y
C
R
V
H F
L
Y
F
P S
R
T
Y
T
KKH PA Q S N G C
A A R
D
V
I
S T
K
P D A
S
Y N
A
F
P
S T
D
E
T V M
Q
R
Q
S
A
E
Y
P
Y
T M
A
M
K
V
F
I
K
T
I
E
N
Q
I
RG
LV L V AA
N
A
A
W K
H
A
A
C
E
KK
C
C
A
A
N
T
N
L
C
T
E K
L
T
H
C
S
L
D
H
S
L
V
R
P
I
S VT R
H
H
M
Y
M L
N
Y
A
A
F
Y
Q
R T
T
P
L
K
G
Y
S
H
T
T
S
S A K
R
F
S
K A D S L F A V I M L I A RV A I M MG H E P A R VDA DA H V Q F M R M K T S
E
I
A
S
F
A
N
T
C
G
W
V
R
V
T
P S
K
R
A
G
N
E
H
R
Q
T
V
L
Q
S
R
T
W
T
Y
S
N
T Y
N
L
L R
Y
D
N
KN A
P
P
C
M
S
C H
D T
N
R
E F
A
Y
V
S E G
V
A
E
F
G
M
R K
D
L D
T
Q
N
PI V T G RI HT G RVTI P I E TP HE HE E TP
A A R
P D A
N D A
S
Q
I
N
A Y
K
I
V
K
K
N
Q
M
H
G
A
E
Q
I
H T
L
E
I
F Y
RA
E
D
C
E
K
I
A
V
T
T
I
Y
L
N
N
AT
V
D LREP
Y
T
Q
L
R
R
F
A
P
R
S
Y
D LP DKD R LPH KD R H
N
T
C
N
I P S V S LT G D S G N T Y V
A
T
T
L
QA
D
F
Y
C
I
A
G
M
V
N
H
S N
I D D L SN M SS EA LE M F N P R
L
A
FT
H
E
RY Q
V
T G
D
A L N
V S
S
I
H
R
A
R
T
V
Y
ND MI N EP D IN P IN P
I
L
M S A
C
E R
L
H
Q
P
KV
P
A
V
G
YNA DSW A
L L E F
R
L
D
Y
Q
G GK GK GK N
G
G PI
C
S
R
T
D
A
F
E Q R
T
F
T
E
LK
W WC
N
Y
L
V
A
Y
AS
Q
A
S
A
R
S
T
T Y
N
E K
K
A
C
F
A
RDP
V
N
H S K
D
Y
M
R
K
S
V
N
KQ PHLP KS R K
N
R
F
L
M
L
H
A
G
A
G
I
T
V
QA
AT
V
Y
F
R
S
L
T
S
K
S
D
E
A R
C
Q
A
T
Q
I LD A V N E RG LV I FA E F VL N K VL S R T QA A KKH K N AN P AA NWK Q I TD Y KA F I PA LV AL I AA F AY K Q DE L LA D K S I V SN D TEV K T I N L TL S I SA R T V G DSRK T M N V I K TQY EM I C VK T SSQ AA E S E PS N E E Q M H Y F M P NANV A EW D Y F A N L K SG SS V E ER P N E M E M RH QG M F N I A F Y S G K H E S P QQ F N R L F DYQ V I S K V QV M AQ CT V P KN I VV R V R H L NA L LN LT F N YS A D I A N A RE M S I M R A TS R E R Y T K F R
L
A N
F
Y
I
R
T
I
FT
R
A
E
Q
K G S N V
A
T
T
D
G
E P I P IK
L TA N I SQ E K A D LKEK Q L G AV
VH S
A RWDD K L R PWS E
R
S N
D
A
F
T
Y
A
S
R KK DP QE
I PKE
DL
K
GI
RV
I
V G D S A I YN VL L A G D A I F V L AD L N F KL RG G TA A K A F S VD A AS L DS I FVM L VL F WE SL AN R TV TN Q V M AHSVF QV KL KNY I Y E F I K N YL KH AVA A WKKN A L I L L NWKF M Y E AA K R FA I P A F M A FA A AY D Q AV I I D L I F L K D A I MG S I F SN A K S I P T N L FV T A F I QV T V G M I K E C V AN T SS F R D K F LK GV A VDA E E Q DA M T M P E A EV H WE M H F Y N F VNW WD M P T N LE SG E V G R M VS QM M F N L SY E C Q T N I S F Y M P S N L F A N R Y
H S M K
Y
M E
A I
Y A H
G
L
R
R
T
I
M
S
Y
Q
V
FT
S
Y
V
M
S
L
Y
Q
SS
R A
AA
T
G
T
Q
A
T
M A
R V
D
L
ATV AW I F QV K VM V
Q
Y
M
A
G
A
Y
L
R
T
I
A
K
S
V L
A RD
G I
PQ S F G
D
A
T
V S
C
LDEDL T
MG
F Y
L
L
P
I
F Y
RA
E
N
D
G
A L N
M S A
P
T
D
P
I
T
H
Y
A
H
K
C
I
L
L
A
V
M
E
T
N S
G A
S
I
AY V AA G L E W T Q R
Q
Y
L
R
G
K
T
V S
C
D
L
M A
R
A
I
I
S
Y
D N
V
FF
L
I
E
L G
W
VY KC K VT VM
0
SS
F
R
N
A
V
AA
T
M H
E
G
Q
T
Y
Y
N
β3"
KPFL S
G
T
VN A I
FM
A
V
VL
S
K
R
T
AY
α9"
KE
T
V
G
N
R
PQ S F
M
A
S
G
Y
I
K K
V
L
A RD G
D
A
G Y
A
F
N
M
T
S
T
K
I
L
S
N
W
M
F
T
R
N
V
QV
G
T
H H
DK
A L DA W
N
W
F T
S
C
KKV
F
QED A
V
M
N
L
N
S
P
A F
R
I
Q
T
A
Y
K
A
W
K
V
T
N
I
S
T
E
E
S
R M
Q
T
I
A
ES Y
E
N
N LD G SK T S D T D I GM SE F TGLYN
KT
H
F
S D
Q
E
L
Q
K
N
N
NA
R M
M S
V
T
E
P S
E
I
I
K
K V
C P Q
R
β4"
T
Q
F
Y
K
I
V
V
L
S
S M
T
V
I
K
Q
N
M
R
V
VPK
Q A
R
S Q
I
A
A
A M
T
I
D
A
D
F
I
L GVI
K
D
D SA
S
K
F
F Q E
A D
Q
I
L
A
P
F
Q
Q
T
S
S
K
Q
S Q
N R
D
A
VPDKD
I
A
T
D
EI
P Q
K
P
T KG D A N A E E S L L R D
K
D
I
Q
S
S
E
K
DL
E A
M
N
A
A RF
V
A V
I
H VA L
L
N
S
T
C
KY
RSM
A
E
H
K
A
D
H
I
M
N
I
S L F
V
A F
C
T
H
K
T
N
A
I
Y
H F
S
P
F V R
G
S
E D
A
V
M
L
N
I
P
T KG D A N A E E S L L VR H
H T N E H Y T RTQ P V C
H
E
Y
A V
N
H
K
T
M
N
D
C
M
H VA L
E
T
H
K
T
S
S
I
A
Q
S
S
H
K
N
R K
V
E
F
A
L
A
T
R
C
V
L
A
R
I
G
G
A
E
T
Y
Q
P
Q
T
A
S
M
V
N
P
LP EEYYGG LP GG A
T
L
N
A
S
E
E
T
F
V
N L
D H
K
A
A
F
I
K V
I
F Q E T
A
L
H
T
F FL
C
T
D
E
S
P
E
D
Q
K
P
G
GM LY F E I KV N NA A P M S I
K
M S
V
V
I
K
VH S
F
H
A
C
T V M
S D
K
SA
F
H
A
A
T
M
C P Q
I
T V
KKVI PKE QED
F
A
T
F L
TQ
VS
D
V
K
C
Q
E
G
S
S
Q
A
S
S
L GVI
D
S
S
N L FM
T
A
V
L
S M
A
R
T
D
H
I
N
V
D
I
S
M
N
T KG D A N A E E S L L R
A F
C
T
F V R
G
A
E
M
T
V
R
Q
H
I
P
E
A V
I
H VA L
H
E
L
H
D
A
T C
T
C
Y
A
E
T
MY
A
AY
K
T
F
S
M
I
A
D
H F
P
S Q
V
Y
P
A
S
VMK
G
V
S
S
E
H
P
C
V
F
AM
D
E
T
Q
A
T
AC
H
L
G
I
I
T
M Y Q R QY P F
V
A
I
Q
T
C
Y
S
T
K S
A
L
N
P
L
F
S
K
A
K
A
N
KT
R
F
QV
G
N
I
K V
E
N
Y
α10"
S
E
I
Q
HV L I Y V
N E
M KK SL MA
R
R
A
L
H
D A
H
S V V T R
A
V
M
F
Y
K
I
IR EV
T
A
T
T
F
F
S
T
A L
R
T
R M K
L V
D
L
N
M
T V M
N
C P Q
S Q T
W
N
Q
E
L
A M
Q
R T
A
F
T
T
A L DA W
I
Y
S
P V S
I
T
R
VY
S
TQ
G
S
I
H
C
K
Q
Q
A
T
AG R
H
H
T
F
A
N
K RI K RI R
S L
S M
F
S
A
Q
E
Q
Y
W
N
Y
A
F
A
V
K
T
R
A
E F
S A Y K
E T T Y V A G KT V F SY AM M GV L FQ S D K F K A A K L FQV E D SY L M I I F A I V M I I K N KV L N I L M A RV M H A AM A I N F MG T VA AM R H E AV F TQ P AA H R F M VD P DA FYF H I W V VS V Q F A M M MY S I R M K T S
F G V
G
M
P
G
M
F E
N
V
I
T
F
LGV
A A YD Q
N
I
I
K V
P
KL
T
E
W
NA
E
T
R T
W
N
Y
A L DA
S
GM LY
S D
N
C P Q
R M
M S
Q
N
R
F
F
P S
M
K
V
G
A
D S F A V I M L A RV A I M MG H E P A R VDA DA H V Q F M R M K T S
F
D
E
M
V
Y
V
H A N Y
F
E S
P
D
E
A
K
R
I
A Y
H
R
E
I
Q
QV
G
T
D A T
L
T
KT
K
D
W
N
W
S
N
R
Y
A L DA
I
R
C H
S E G
N
T
V
V
V
Q
K
C T V M
Q
E
V
N
D
S
H
Y
M
T
R
S
R
Q P D A
T
I
Y
T
SP LM I NK
Q SV
M
H
A MM
Q
T
H
A
L
W
S
G
D
Y
LF LF I IV VL F I R
K
M
I
K K
Q
R T
Y N
L
A RD
M S H I A A YT R
K
S K TQ T G S G
W
T
E
E
PQ S F
L
V
S
L
L
F
M
A
V
W
S
Y
S I Y P K
H
A MM
Y
G
T
G
R
T
Q
AA
V
M
T
N
K
T
Q
A
L
T
M
S
A
S
M
L
V
V
E
T TQ
H
I
T
T
Y
R
N
G
H
D
K
LT A
N
Y
V L
GA R
E
A
E
N
S
SS
I
C
V
P
V
Q
M H
G
K K G
L
R
F
KL G I W G VY KC K VT VM
F
S N
Y
M S
SA
Y
V
V
A
Y
V
T
L
K
M
Q S
Y
N
1
P
I
Y
T
E
T
M H
L
A P
L
V
V
A
F
R L
T
M
I
Y
E
D
M
N
K
Y
Y
R
T
V
F
N
Q KL VFRA I I S VL DA YN G I A L G WA F D T AGY SW A G AK DM TC V VA PQ K QR Y M T L L V E WM F AH I F S KY S QV MM F D K AV VM V L V A
A
N
A
M
A
G
K K
P H Q
I SV
I
G
I F L V E
Q
N
L
R
V
T
M
N
DSW A
KV
V
Y
T
V
F
S
S
S
L L E F
V
C
S
R
L
A
Y
Y
S N
A
H
Y
R
N
M
T
N
K
A MM
Q
E Q R
T
V
R K E
E
R
T
L
W
L
T
L
V
T
K
Y
H
K
I
A
V
S
M
G G YNA L
K
M
L
Y
G
C
N
K
RA
E
N
R
L
V
S
Y
VL G
A
AT
V
N
E
Y
I
V S
C
ED
KL G I W G VY KC K VT VM
M H
D P
E
L
L
P
QA
V
C
H
L
L V PF VI I LV PFLV I I L PFL T
M S A
F
Y
F
V
Y
F
I
L
S
A
R
A
R
T
A
V
P
V
L
E Q R
T
R
A
Y
Q
I
H
V
N
H
K
K
K
L
R
L
bits
I
R K V
M
H R R T
N
S
F
H
A
F
Y
bits
I
DY LQ
R
QG
R
T
L
A
ECFK KD
I VT I KT P E F P E S Q TSS
Q
S
I
A
V
R
T
W
C
L
ER F V Q
Q
V
E
Y
D A
T
G
TS
Y
N
LQT
S
P
L D
E
F AY
N
L
Q
C
S
R
V
RE R
V
S
F
Y
L R
W
LV L V AA
S
N
S
A
K
F
E
NA
Q
D
C
S
N
T
T
D
S
P N RH KN R LT
T T K P F LL 102* C D R P K R LGG L I VM KL Y N LD E I V K L A L Y F R P L D K R G R K V E G D KR SR Y L ENFF I PK T T IH K C Y S I I E Y I M A
G
E
T Y
N
E
DKEN T LE QAE D S GYS A
K
P
A
KQN GPHLP S R K
N
R
A
C
Q
N L C
D
E
FKNLF A
A
K
I
R M
Q
E
P
G AN N
Q
Q
IL
S TR
D
A
H
Y
N
T
T
QEV
QA
AT
V
D A
T
G
LI D YYT
L R L KK GR GDV R K FFSF K HRL ENL Y I P I HASY YN D C M L S P M Y I M C
TL
A A K I
L D
E
F
R
LT
I
M VV
N 0 4 2 2 4 3 T L L R P I R F LVF LL 1 D V CR LDRR E I QEG D N 1 KK DKFKNL A L G VL K L DE R I R K V Y VGK PEL TE G ELT NYKHA GL DK VMSK S DG L NL RL F KR D DL NLF K Y FDE ECF H E L I SF FF YN E VV Y D T I I LF S S I S EN Y I P I M TRL D I A A L D L I LLC Y 3 YV L S Y I I N P E E P 2 F M Y L G M M AN V C 0 0 L M L AKL L V VTL 2 L YL Y L N 1 I LM K I FC G I A QF GM C E TQV I V E N S TQF DLSFS I 4 L H QV VF R I I ED I I 4 T I VR LM I SLβ1" V L A MR Y KRW N I NM α4" L α5"TVL α6" L V SLGM I Q I QRM I F I YL P N T F L L C F P IL D LL 1 R D VV C E D R N D L A L 0 TD Y K V I K L LV I N K V PENYK G ELY GGD L NLSKF F DL KFQELF 3 E 178* I Y VV LLD M I I 3 S S 210 I V A A L L Y P 172 213 0 4 2 2 208* 4 3 L M L VTL L YL TG L V L Y V VF KL 1 I LM K I P I F I DA 1 DF S FNLL A QF GN CD E Q I V EA L MFL TR CR DLSF I V L H Q TD I E I VVGL SQ RN I E VLVK KF I D I EL L LVN I K VR I T I VEK I MNL KL V Y AL G LL TD NR GGL MR Y YK F D NMLDP D KFQELF YL LS E I V I M YS LYL M I WSV T L M I V QL I I I QR S P SF 3 N I V A A L L 2 L C Y P 0 0 F V 2 N GS E F Y 1 I FN I V Q MK Y NLDS H T A K K I REM F SP S I FN E GN 4 DI T P D E K H DP KD V 4 I RSL I N P QK VL I TT L VKK D I H K FL SL EM S LE I Y TL SRS G F R V KV V L L L LF V V L L YL A V F Y AKL M L T 1 I K I F L F G I S F A L G C E TQ I 0 I V Eweblogo.berkeley.edu M TQ I NLT VVV L DLSFSQV QL I L H LRM I ED I I VR I M 3 MRA Y KRW N I NM L S I 3 V SLGM LYL T Q I QRM I F P N LI C 0 2 2 N C 4 V α6"I FF α7" F VF N β2"L L LY α8" L VT GS β3" E Y L 1 S IL Y L Y V Q MK NL AK F ME KHVTL I 1 DF KGM I K SP NHE I A L SL E P F I R F I L G I D GN D A TNHS P CL F L Q V KM F I E E T K L H D K V D Q I V L I PRM Q Q N Q V TNF I S TT V S K L E DVR I I I DR SL I RS V E I E V I I LT MK VV II N SD RA LG I MK Y L KR SR G RL YM TF V S265/66* VL 3 L K VN LYL M I SL WS I T M Q I QL 260/61* L PF N A F LIM C 0 weblogo.berkeley.edu 0 243 2 216* 229* 252 4 F V K F N GS E F Y 1 S I Y V N H F E T MSP A LD K I I S IR M F N K VL S I E GN Q DI T P D E H DP KD VN QK I E L VKK D I H K I NK EM FL SL R LV 3 LT T P L S RS I Y TG F R V KV LS L A F 0 weblogo.berkeley.edu 2 VL G
bits bits
Q
I
FLR CK S E AK FLR CK S E AK G RV R FLR CK S E AKD GFLRRV CKRS E AKD
L E
L R
W
P I R T F VE
R K V
L D T
R
R
I D D L SN M SSV EA LE M F N P R
S
D
E
V
I
V
M
N
T
L PE I K
L
A
F
S
Y
DA
ATV AW I F QV K VM V
bits bits
I
L
I
Y
C
A
E
G
K K
N
H
R
I
FT
H
S
V
L
E
G
H R
V
Y
NA
D
D
E I KDL I
KK V
A
T
A
VDQ
KD
DY LQ
S N
G
E
W
NA
Q
KR
ECFK
L
G
H R R T
N
F
Y
D
E
D
A
K
P
A
N N
Q
L L L LH
F
Q
E
N L C
D
Q
T T
S TR P
83*
62*
L R KQN GPHLP CRAL_TRIO_N domain KS I
R
S
A
T
R
Q
R I EE
T
Y
D
W WC
N
H
T
E
LK
H S K
R
A
F
D
AG
H
Y
N
R
R
D
P I R T F VE
L D T
M
N
T
DKEN T LE QAE
RE R
V
C
F
A
D
D S GYS R AQ
M
M
S
H
A
K
Q
N
A R
Q
L
F
A RS L P
E I KDL I S
S
E
bits bits
I
Q
K N
T
T
S
E F M
E Q
K
V
P R
T
LR WL K LRRD WLG K LRRD WLG K
DA
E
K K
S
R K V
V
DY LQ
R
H R R T S N
F
α1"
PELFDLI A
M D Q
S KN
L
I
D
A
A
G
AL V
VDQ
NA
N
S
F
DI R I EE
A RS L P
KD
D DD LPDD LPDD
KR
ECFK
QG
R
A
C
F
T
T
M D Q
bits bits
RE R
V
S
H
Y
S KN
bits bits
Q
S
L
V
F
N 4 2 4 3 1 3 2 0 2 1 4 4
M
A R
Q
S
A
KQN GPHLP KS
DKEN T LE QAE D S GYS A
M
EP
bits bits bits
T
S
N
K N
V
P R
T
I
K
T
S
E F M
E Q
E
S KN
E I KDL I Q
DA
TE D
AL V
R I EE
A RS L P
0 3
LR WL K
DI
2
AC C
bits
3
M AN U
C
1 4
β4"
β3"
S
P
C
T
D
H
F V R
I
K
T
E K
GS
P K
G
P T
D A N A E E S L L R
D
K
D
I
Q
S
S
N
G
E
I
α11"
weblogo.berkeley.edu
C
908 930 4 004 629 XP x00 279 Bm s Mse S201 97 G M 72 94 1 O DP P 3 40 8 Dp Ag X FA0 378 5 E 2 Tc XP 6 2327 39 7 0 Am NP 310 9076 0 4 Dm XP 93 13 19 Ag 004S20 91 60 0 8 3 XP OG P 6 59 82 5 4 Bm DP m N P 62 Dp D g X XP A Am
Ms Manduca sexta Dp Danaus plexippus Bm Bombyx mori
Bm XP 004922800 Bm X P 0 0 4 9 2 2 7 6 2 Ms Msex005131 a Ms Msex0 Bm XP 0 05131b 0 4 9 2 2763 Dp DP Dp D OGS212706 Ms M POGS2127 05 Bm sex005 Bm XP 0049 127 X 2 2 P 758 Ms 00 Dp Msex 49227 Dp DPO 00512 57 4 Ms DPO GS2 Bm Ms GS 12755 Tc XP ex01 21275 4 A XP 00 195 D m X 96 492 9 Ag m N P 0 7589 7864 M X P 06 B s P 65 565 Dp m XMse 318 1882 388 Tc B P x0 02 XP F0 00 019 7 1 4 00 01 928 52 18 6B 97 10 2F 1 29 09 0
ACCEPTED MANUSCRIPT
SC M AN U TE D EP
AC C
DP Dp OG Bm DP E S2 X OG HJ 14 Dp Ms MP 00 S2 72 230 9 Dp DP se 492 142 49 D OG x0 9 29 Ms POG S2 038 378 Bm M S 14 04 XP sex 214 232 0 2 Dm 0049 038 33 0 Ag ABV 2924 5 8 8 X Ag P 32 2392 Dm XP 31 0369 8 NP Dm 6 052 NP 1051 Tc X 72481 0 Dp E P 97310 3 HJ 6 Dp EH 78510 J7851 Bm XP 1 0049 Ms com 29247 p5 Bm XP 0049 6142 29246 Ms scaffold0005 3 Bm XP 004929244 Tc XP 968368 Dm NP 730931 4 Ag XP 00343615 961 Ms Msex0058929 0492 Bm XP 0 HJ74795 Dp E 63328 5 6 P 006 Am X EHJ790832 Dp P 9732 Tc X 9703185 P 35 Tc X 01040 5959 0 NP Mse00 9084 m 7 B Ms EHJ 9085 0 Dp EHJ7 0596 1 3 Dp sex0 289 M 049 s 0 M P X Bm Dp
Bootstrap support >80%
64 02 09 1 31 18 65 93 7 2 6 XP P 9 310 S20 29 6 g A c X XP OG 059 883 4 T g DP ex0 492 63 A p s 0 29 D s M P 0 049 860 M m X P 0 000 B m X sex 311 B s M J75 460 M EH 969 77 Dp XP 3120 13 Tc XP 5723 61 Ag NP 0008 Dm Msex 00862 5 Ms Msex0 92963 Ms XP 004 2 Bm HJ7531 2928 Dp E P 00492 6 Bm X P 00492963 Bm X J70837 Dp EH 4929639 Bm XP 00 63 Ms Msex0008 Tc XP 966309 Tc XP 008197657 Tc XP 971470 Tc XP 971527 Tc XP 970129 Dp DPO Bm XP GS206990 Ms Ms 004928861 Bm X ex010502 Dp D P 00493 Ms M POGS2 3930 Bm sex0 06941 Bm XP 00 10499 Bm XP 0 49313 Ms XP 0 04931 68 367 Bm Ms 049 M XP ex00 3142 Ms s Ms 004 4519 2 M M ex 928 B s M sex 010 857 Ms m X sex 010 488 D M P 0 012 487 s 0 D p 1 Dp p E EHJ ex0 492 01 D D H 79 10 905 p P J7 2 49 EH O 9 91 2 4 J7 GS 292 Lepidopteran 92 20 89 69 expansion 88
7 2 49 05 10 29 x0 9 98 4 4 se 0 0 96 63 M 0 01 4 s XP sex 010 288 9 M x 9 Bm s M se 004 1048 5 M s M XP x0 049 M se 01 Bms M sex M sM M
Dp
Clathrin box
T Dp c X Bm D P P 0 X O 08 Bm Ms P 0 GS2 19 3 M 0 XP se 49 013 74 x0 30 4 1 Dp Ms 0 M 04 0 90 5 Tc DPO sex 93 629 6 Am XP GS 014 091 5 1 0 2 Dm XP 0 081 013 871 93 4 0 N Ms P 00 6566 2177 Bm Mse 126 455 0 Dp XP 00 x0107 581 DPO 49 0 GS 2562 2 Ms 2 5 Bm Msex0 15399 X 1 Dp D P 00492 4334 POG 5 S200 626 Dp D 02 PO Bm XP GS21539 2 8 00 Ms Mse 4925639 x01070 1 Tc XP 9723 45 Ag XP 317327 Bm XP 004931769 Dp DPOGS207391 Ms Msex002032 53 Bm XP 004922581 3 32 16 x0 Ms Mse S213497 Dp DPOG 87951 0016 Ag XP P 648579 Dm N 300003 304 PSCF p D D Msex018 557 Ms XP 315 939 Ag P 523 155 N 6 Dm GS20 9842 DPO 00492103262 p D XP ex0 031 Bm s Ms 0325 49212 M P0 97 8 X XP 209 42 Am Tc XP 3 6510 001 0 8 Ag NP F30 892 2 6 A C 2 PD PS 049 059 250 9 Dm p D P 0 ex0 931 703 08 D X s 4 0 3 PINTA M 0 4 Bm Ms P 0 GS2 00 X O ex s P m B D M Dp Ms
RI PT
Tc Tribolium castaneum Am Apis mellifera Ag Anopheles gambiae Dm Drosophila melanogaster
Contig ID 0
co
0 56 14 2_ c0 mp 65 60 co _c mp 0 67 17 co _ c0 mp 75 98 co _c mp 0 85 81 co _c mp 0 85 92 _c 0
_c
52 7
54
mp
co
_c
0
0
_c
69 3
50
mp
co
mp
co
72 9
45
_c
0
0
_c
57 8
35
mp
co
mp
co
45 1
34
0
_c
43 9
27
mp
EP TE D M AN U
SC
RI PT
400
co
0
_c
28 3
24
mp
co
0
_c
44 2
23
mp
co
0
_c
57 5
20
mp
co
0
_c
38 7
20
mp
co
0
_c
33 8
20
mp
0
_c
89 6
_c
0
300
co
mp
co
87 5
18
mp
co
17
0
_c
81 3
17
mp
0
_c
AC C
200
co
mp
0
_c
57 9
17
mp
co
51 2
17
100
co
0
0
_c
42 8
17
mp
co
9_ c
23
73
0
_c
11 6
17
mp
co
mp 1
co
0
_c
49 6
16
mp
co
0
_c
14 7
16
mp
co
_c
0
0
5_ c
49 0
13
mp
co
mp
co
19
6_ c
99
24
mp 1
03
mp 1
co
co
Expression level (FPKM)
ACCEPTED MANUSCRIPT
500 Female Male
*
0
ACCEPTED MANUSCRIPT
Highlights
• •
AC C
EP
TE D
M AN U
SC
•
The insect CRAL_TRIO domain containing proteins contain 18 highly conserved amino acid residues. We present the predicted 3D structure of the consensus insect CRAL_TRIO containing SEC14 domain. The CRAL_TRIO domain containing gene family shows a large expansion in lepidopterans with lineage specific duplication events. Thirty of the CRAL_TRIO domain containing proteins are expressed in the head tissue of Manduca sexta.
RI PT
•
1.00E-134 2.00E-133 1.00E-179 0.00E+00 4.00E-161 5.00E-158 2.00E-156 4.00E-117 5.00E-106 2.00E-165 5.00E-131 2.00E-92 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.00E-173 2.00E-83 1.00E-65 1.00E-97 5.00E-72 3.00E-100 2.00E-141 2.00E-86 1.00E-121 1.00E-83 4.00E-129 0.00E+00 0.00E+00 6.00E-132 3.00E-66 0.00E+00 7.00E-174 0.00E+00 5.00E-32 2.00E-76 8.00E-101
Identity Accession 85% XP_004929634 61% XM_004929578.1 65% XP_004929635.1 65% XP_004929639.1 78% XM_004928914.1 88% EHJ76881.1 71% XP_004929378 69% XP_004929248 75% AK382842.1 54% XP_004931422.1 54% XP_004922757.1 71% XP_004922758.1 60% XP_004922800.1 53% XM_004922706.1 93% XP_004928836 85% NP_001040355 78% XP_004928931 92% XP_004928929 81% XM_004928871.1 90% XP_004930908 91% XP_004930906.1 71% XM_004929785.1 42% XP_004928863.1 38% XM_004928806.1 46% XM_004928806.1 37% XM_004928806.1 45% XP_004928863.1 58% XP_004928863.1 43% XM_004928806.1 53% XP_004929052.1 44% XM_004928806.1 56% EHJ79293.1 80% XM_004925582.1 84% XP_004925625.1 61% XP_004927864.1 43% XP_004928863.1 82% XP_004925626.1 75% XM_004930854.1 92% XM_004922524.1 88% XP_004933182.1 52% XM_004929189.1 45% EHJ78511
ACCEPTED MANUSCRIPT
Top hit E-value of hit to Pfam CRAL_TRIO domain Amino acid sequence PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 4.40E-23 MPDQEIMQCVDLVTIKDWLSKQPHLPHDVDDILLLRFLTSCKFSLERTKRTIDFFFTIRHNAPELFCKRDPWAPEIRRVFEITDMLPLPNKTKENYKVFIYRLKSPELD PREDICTED: Bombyx mori uncharacterized LOC101746774 (LOC101746774), mRNA 2.00E-19 MSNITSQDIQYIRDWMAKEDYLPKDFSDMMIMKFLHSCYGSLEQTKQCIEKFCVGRLNMPEIFTNRDPLSPKLQTAFSITNMASYNAGECEILLYQIDDPTLENFDFY PREDICTED: uncharacterized protein LOC101746774 [Bombyx mori] 6.30E-20 MAVEMTNISNNDIQFIRDWMAKEDHLPKDFTDMMIKKFLHSCYGSLEQTKQCIEKFCAGRQNMPELYTNRDPLSPRVLTANSITTVATYNIEGDEILIHQLYDPNLE PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 4.80E-26 MDDKISDEVRKQLEDMKQWASTQPLIPKDIEDRILLRFIHSCYYDLEKAKVAADLFFSIRSASSELFTNRDPLAPAMQKTLQIVNLAQYVTSKNRNIWIWQLNDPGL PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101736577), mRNA 2.80E-24 MPVELEFDYDEATASMDKFSQDDIDELRAWTEKLDKSKCVPKDLTDKQLLLFYNACYGDMDKTKTCMEKYYACRKNAPEFFDNRVFNTQEMKSAQDVLEFSFL cellular retinaldehyde-binding protein [Danaus plexippus] 1.30E-34 MTEYAEYDWKLHQMQNERLCEAARKRLTDGDRQRALVALKDTVDSSLNLALRDKSRCQDDNFLRRFLYARKHDVQQSFELLVRYHQHRREHPELWATADGGV PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 1.40E-28 MSIRPLNPQLAEKAKKELFEDPSRIAADVQHLKEWIAKQPHLRARTDDQWLVTFLRGCKYSLERTKEKFDLYYSVRSTAPELFRIHINHAMFDEILSLGSMLILPKLA PREDICTED: clavesin-1-like isoform X1 [Bombyx mori] >ref|XP_004929249.1| PREDICTED: clavesin-1-like isoform X2 [Bombyx mori] 4.30E-28 MIRELPSDLAKIAKEELNEKPKRTKDDVKHLKDWIAKQPHLNARTDDQWLVAMLRGCKFSLERVKQKLDLYYTLRTTAPEVTLRILPTEDKFVEFFKLGSCLVLPK Bombyx mori mRNA, clone: fepM03J23 3.90E-20 MAFLQGPSLKQAEVIKQELGEGPGDLERGVRDLRSMLAASPHLPQPEAFDRHLLELFVRGCRMDLDRARSKLEAFCLARARHSDLYEHRSFTEPPLNDICKFMDIA PREDICTED: uncharacterized protein LOC101744959 [Bombyx mori] 1.40E-24 MPQDTQDLVQLNPDSVKYIRKLYGFEDEKKLEEAIDILDAWIKKQNHFTKKDFPRKYLESTILANKGSIERSKSKLDKLCTLRAILGFFFKFSNIRTDFSNFFKATTLIP PREDICTED: clavesin-2-like [Bombyx mori] 5.60E-13 MALKQISIQREYEKNPEISPEDIAQLREWLVTQPHLPGELLTDLDLLLVYHCCERSMQVSKQVLDLHYTLRTMFTQFFHNRSVDKKIEFTLDTVLICPLPTRSTDGNK PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 1.60E-21 MTADVVQPLPLEEEYKKKTRITPEDIKKFRQWLETQPHLPSDYITDMDLILTYHCCNRSSEVSKQVLDLNFTLRTLFTNLFKDRHPDDKMMKAFSYYLLLPLETRAH PREDICTED: clavesin-1-like [Bombyx mori] 1.30E-19 MTIINSYQQKINMSRLESFPLEEEYKKPTGITPQDIARLRQWLATQPHLPHEHITDLDLILLYHRYGCSIEVSKQELDNQYTLRTLFASFFKDRVLDEKIETVLNTVLIT PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101746971), mRNA 2.10E-19 MNSYVARSFPLEEEYKKPTGITPQDIARLRKWLATQPHLPQHITDLDLILSFHSCKRCMETTKKLLDTHYTMKTNFDAIFKNRIVDDKIELVLKRVLLNPLPTRTKDG PREDICTED: uncharacterized protein LOC101741307 [Bombyx mori] 3.50E-22 MGCWMKSQPHLPHIPDEYIFLFLHSNYYKLKETKDTIECYFTLRANTPDLFTNRDPLTPKNKTIMDITHMVALPNKTTEGYHVLLYRLADFDYSKLNFADAVRVFC cellular retinaldehyde-binding protein [Bombyx mori] >gb|ABF51246.1| cellular retinaldehyde-binding protein [Bombyx mori] 1.20E-28 MCNMKIKQELLVQPTGELWKAIRVELNEDVANNDRDLAAIREWLKKQPHLPDDWETGPLMTFLRGSSFSLEKCKRKLDMHFTMRAACPEFFTKRDASSPELSEIL PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 3.20E-29 MGINETSKFCGRQARKTNEPKMAVSLVQPSGEMWKKIRVELNEDINTRERDLATIKEWLRKQPHLPNEWDDNCLMTFLRGSSFSLEKCKRKLDMYFTMRAACPE PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 4.10E-28 MTLEQPAGEMWQKIREELNENVDTRDADLAHIKEWLKKEPHLPDEFDDQRIMTFLRGCKFSLEKTKRKLDMYFTMRTAVPEFFTNRDVARPELQEILNIVQMPPLP PREDICTED: Bombyx mori uncharacterized LOC101742743 (LOC101742743), mRNA 3.10E-09 MFGIVFNKQHNITHLSPKSFRMSPAASFAERWARDNLTRDPTPDELEVLGLFPITCDGDEIEASKRARRYYKARGPGGLTELWHKRHPDDEDILSSCQDVYIVPLRA PREDICTED: alpha-tocopherol transfer protein-like isoform X1 [Bombyx mori] 7.00E-44 MPAEAETRIVLRDVTPEISQHMLTDLEKLPGVQLGDFLLQFELDEPRESVREIARTELRETPEVVKPAIEELRRLLQEDKDLRVPLDSEAWMIRFLRPCKFYPESSYDL PREDICTED: clavesin-1-like [Bombyx mori] 5.40E-41 MASSEFRLERSVELLPETKEIAEKELRETPERVREALERLRELLKENKDLYFADDDELLTIFLRPCKWYPESALALMRRVAEFKRENASLLDGLLPEHEKDAFLEHK PREDICTED: Bombyx mori clavesin-1-like (LOC101736368), mRNA 4.30E-27 MTTASILMTVTTTWNVTNEKHEKDGGRLGHTLRIVVADAVKELRETSTIKDQTLHIMRDWIQQNNDIVNVRQDEAFLLRFLRHKKYSVPMAQQTLLKYLSLRKYY PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 2.80E-19 MDFFDDVFLKLHPNTVTTVRKLYNLEKREDMDRAIDALEDWLKKQYHFTKTDYTRDFLERSIIMTKGSVERAKTQISNLCSMRAVYPKFFEKCDARNDFNNLDSIA PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101745141), mRNA 3.30E-21 MDSINDILLKLHPDTVTTVRKLYNLEKREDLDRAIDALEDWLKMQHHFTKRDFPRDFLERSIIMMKGSVERAKYQISNLCSMRAMYPKFFDKYDAKNDFNKLDYF PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101745141), mRNA 1.10E-17 MESLPKDPILEFNDNTFKYMRKEYKFEKNEDMNAAIDAFEDWIAKQNHFTKKKYPRDYLERLIILQKGSVERAKTHMEKLCTLRTLLPQFFEHYDVKKDCIPFINGA PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101745141), mRNA 6.00E-19 MDAVPEDKLLRYTPDTMQALRRLYNLDKPGDMDFAITALQDWLKKQHHLTKKDYSRSYLERLIILQKGSLERSKAQIENICAMRTLLPEYFEINGDEEIVKLDGLS PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 9.30E-17 MESLPSDPILEYNDDTFESLRKQYNFEKKEDMDAAIDVFEEWLKKQNHFTRKEYPREHLERLIILHKGSVERAKTHMDKLCTLKTLLPHFFEHYDVRNDCASYIND PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 2.40E-23 MEELPRDKILTYNPDTLKAVRYEFNLENPGRLEEAINILEDWIKKQDHFVKKDFDREYLERTIVTTKGLVERAKARMDKLCTFKTLLPNFFQILEEKRHQSIAEKLSH PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101745141), mRNA 5.00E-18 MQAIPKLPILEFTDDALESIRKCHNLHDPGRMEEAVDILCEWIQKQEHFLRKDFPRDYLERIIISTKGSLEKAKVRLDKICTMKTLAPKYFQDQDLRDVKHVLDGFKF PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 7.60E-19 MESLPCNPILKFKPDTLDVIRKSFNLDQRELNEAIDRFEEWLNKQEHFVKKDFSRDYIERTIIQGKGSVEKAKSQMDKLCTLKTLLPKFFKNCDIKTEFKDISQIAWLI PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101745141), mRNA 9.20E-20 MEELKPTNILKFNPNQMADVRKVFGYGDIGILKQDIKILEDWINKQNHFIVRDFGRDYLERFLIYNKGSVERAKKHFDRLCTFRNLMPEFLRDFDVRNEFVDLFKCS putative CRAL/TRIO domain-containing protein [Danaus plexippus] 5.60E-18 MERPPHHPLVIVTEDDIKAIREIYNIGDENKINESLNAIQEWIQKEEHLVEASKYLSRRNLERVLLIAKGSVEETKRRIERLLTARGMIPELCLNKTLEEFEDIWDAVNY PREDICTED: Bombyx mori clavesin-1-like (LOC101744403), mRNA 6.30E-27 MKTDIKMWSFEEIAFQAEMNRHEDPDSEEDAFVLCGEDPTTRNKKIVEFRNLIKERNECHPHRTDDAYLLRFLRARQSIPARAHRLLVRYCNFREANPHLWRDVDW PREDICTED: clavesin-1-like [Bombyx mori] 6.70E-31 MPFIDIAFQAEISRYEDPEFEEFAKKNCNENFETRGKAIDELRRMIREKGECCPRRLDDAYCLRFLRCRRFIPALAHKLMVRYEEFRHKNAYLYNCDAFGLQKVKDV PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 1.70E-17 MTVSQIAMEIEYNKETNIKPEVILRLREWLQKQAHMPHDHITELDIILAYHCCDCDAEITKRVIDLNFTARTLFSFYQNREINYSLETALHTWLVTPLDAATNKGYRP PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 2.00E-15 MGSCDDELLKLHSDTVSTLRKLYKLDKREDMDRAIDALEDWLKKQYHFTKREYPREYLERSIIFMKGSVERVKLQISNLCSMRTIFPNFFEKVDAINDFVHYEDPV PREDICTED: clavesin-2-like [Bombyx mori] 2.50E-26 MYECFLEIAYEAELNTEENEEMLELAKELCNENPATRATAITDLRQMIYERGECRPWRTDDAFLLRYLRARNFVVPRAHKLLVRNCVFREQHNQLYEGVDLWGLV PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101742992), mRNA 9.70E-36 MPTPDYSVDIDLGEPPPELEDYARQHCGEDPSTKLQAIDELRDMIYERGECTPHRMDDEFLVKFLRARNFIPQRAHRLLVNYYQFKEENPELFDNVYPLDLQGIGDA PREDICTED: Bombyx mori retinaldehyde-binding protein 1-like (LOC101739669), transcript variant X1, mRNA 3.30E-30 MATTVLQPEECRREHEPKPNIIDSPRLTDDERVQILEAKERETGDLPPELREKARMELREEPALREQALAQMRHFIDKHPAIKKCRTDAPFLLRFLRTKKYSIPQACSM PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 7.00E-11 MRHKEIYGVNVHPTIKFALDFGMALISEKIRKRVKLYTSIEDVEIDKSVLPKEYGGVMPMRQMI PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 2.10E-12 DQWFLAFLRGCKHDVEKSKRKLEKYYTLKKSAPDFFANRNPTDEKIQALLKLGLPLRQCVKEDSPRTCLVRVSQLCNSQYSLVDLTKIGFMTAEILLLEDDNFTVV hypothetical protein KGM_17942 [Danaus plexippus] 3.10E-25 MTVRALSSVLRKKAEEEVNETPNVADAAIAQLRDWLHQKPHLRAAKIPDQWLIAFLRGSKFNINKAQEKLDTYYTLKTIAPEIAANRHPFDASIQEILKIGFFLPLRK
RI PT
E-value 0.00E+00 2.00E-125
EP
TE D
M AN U
SC
Manduca sexta Trinity mRNA contig Expressed in head? yes no yes yes yes yes no no yes yes yes yes no no yes yes yes yes yes yes yes yes no no no no no yes yes yes yes yes yes no yes yes yes yes yes no no yes
AC C
Manduca sexta gene
Msex000860 comp16496_c0 Msex000861 N/A Msex000862 comp34451_c0 Msex000863 comp17428_c0 Msex001952 comp17116_c0 Msex002032 comp23442_c0 Msex003804 N/A Msex003805 N/A Msex004308 comp18896_c0 Msex004519 comp8581_c0 Msex005124 comp17875_c0 Msex005127 comp17579_c0 Msex005131a N/A Msex005131b N/A Msex005929 comp8592_c0 Msex005959 comp6560_c0 Msex005960 comp50693_c0 Msex005961 comp45729_c0 Msex005962 comp54527_c0 Msex006294 comp24283_c0 Msex006295 comp6717_c0 Msex010326 comp20338_c0 Msex010487 N/A Msex010488 N/A Msex010489 N/A Msex010492 N/A Msex010495 N/A Msex010496 comp17512_c0 Msex010497 comp17813_c0 Msex010498 comp13490_c0 Msex010499 comp16147_c0 Msex010502 comp20387_c0 Msex010701 comp173239_c0 Msex010702 N/A Msex011959 comp7598_c0 Msex012101 comp124195_c0 Msex014334 comp27439_c0 Msex014871 comp20575_c0 Msex016323 comp35578_c0 Msex018304 N/A MsexScaffold00053 (961518-965039) N/A N/A comp56142_c0
ACCEPTED MANUSCRIPT
RI PT
NDKRWAVDDSLRPLNSKDDRVDKVKELPGSFRSLALD NSVEESREFYLDDNLWKAIPTKKSKAMENAMNGSFRTLSID WKGLAAKKSKSADNAMNGSFRTLSID VFMKKSLDLFSKNREHFLNDNLWRVDKKSKNKDVSTEIGSFRSLAID KFYETVIPRSSLPPDHGGTLPDTQTLHKKCMQQLQLLEPYFKAEEEQRINALPDKKREKAMEKAFKNLDID ERIVLHGNNLSTLHDALPLDILPAELGGEGPSYNSERWLQEFCRSENINPAPVTPVTTAASDDNDLPSAKLDKAYKQKDNANFSFNGSEKSAKSELLRDKD IIDYWKKKVQEYRVFLEEDQTFGTDESKRPGKPKTAEDMFGVEGSFRQLQFD HIPKEIVPTEYGGDGGPLVDIIEYWVSKIKEYQDWMRQEVKFGTDESKRAGKPMVKEMAGLDGSFRKLEID LLEHISASVLPQEWGGEEEPIDVLVRKWRQRIDESREFLRDLNELCCAVPAQIADAEVYGTMGSFRKLDID SYKEIKDSWIELLSSKESMEYLKASREATVNEAYRLKSDYNEHLGLAGTFRTLSID ARPNNVDNIFGIQGSFKKLDID MEALPKEAGGQYKDFHTAQADILAKIKGSPAFFAEENKKRVKEAVRPGKPKTITDIFGGVEGSFKKLEID YKTLEELREETQAMMRANKDFFVEENKKRVKESLRPGKPKTITDIFGSVEGTIFLKQG SFHSCKRCMESTKKLLDTHYT LIETEMFKADEKKRIKKPSKGMFASFTSSFKSLDID DCSTMEALHEQWTMKLQEYRDWFKRQESIKANEELRPGKPTNYDELFGIDGSFRQLTID PFLKEKMKNRIFFHTDITELQKYIPKEIIPIEYGGNGCSMEEVNIAWMKKLEEYIPWFKEQENIKADEALRPGKPTNYDELFGIDGSFRQLSID YVPKDMLPAEYGGNAGPMNDLHNAWVKKLGEYTDWFTENENIKANEALRPGKPTNYDELFGIDGSFRQLVID GAMKDLNDEWTKKLLSNAKWLQTEERKFYDTDLRPEVRTRKNKNAMRTLRSNIGSYDMITRAHSCRSLHTRDDLDEGLYGAYRTLKVKPDKFYV EKLRSRIIFMGNDRDLLHKHISPKCLPESYGGTLTIPSVTGAQWLELLLMCDREFMAINSYGYKKK HGSKMSSLHKHMSPSHLPADYGGELPAIDYSGAEWYPVINDILPHFDKMELQLNEDGKPYVMW FAISKVTPKMAERLHIYTNLKQLHKQTDAECLPTIYGGPIELQNMIVYTKQLLGEQRKKVLALDEMLITNTRGIISSRAKNVKSDTTSVEGSFRKLEID IAVLKEQWKDLLGTKENVNYMQEMSNAVSNEAYKIKDKYNLDGAG MAVLQGMNNGKMFLVRKKTLITCKR ESLQVTHDKWIDVLSSKEFVEFIQVMNQGRTNESLRLNDTVKEQYMGIAGTFRTLSVD ADRHMKWVQALTSKKMTEHLKEMSEAKSTQGYKMKDTISDPDLGISGSFRSLNVD QESLESIYEKWLDVLTTKEFMEYLQVMNQARSREDLRLNDTFKEEYMSIAGTFRTLSVD TLYDYVPKSMLPSEFDGDGKSLRELHQQWIEVITSEEFTQYLKEMNTATTNEAYRQTDKFNEQYLGMPGSFRTLSVD KSIAELHDKWFEVLNSKEHREYIKETVKAGTNESRRLINKLNDEYLGMPGTFRVLKID VEELNKKWMEIISTEEHTAYMKEMNGACTDESRRSKLDFNNEYIGMPGTFRTLTVD KELSEANFKELSTDEHIARLKFLEKAATNEDLRLSCKFNEEYSGLPGAFKTLCVD MSCDKLNELWKETLKSEESQKIIRDTEKLVSDETKRHESKFNEEYMGMPGSFRRLTVD KVYMHGYDITSLHRYIDPECLPKRYGGYRDPVSLERWLTKIKEYKNKSFDNDMRNLGYGVD RHVVTTEEWLAKVYKYRDDFLVQELRDLGFYVK KANKDFFEMESKKRVDETKRTERPKTISSFFPNMEEAFKKLEID
HGSDLKSLQQHISLDCLPPRYGGTCRSHVSFAHWLKKVKKYRDEKFDREMKDLGYVIKE VWPDYPYTIWFESLRKNYHVAKEMVACGYKFREEEISADVVRKLKEDGIQLS CTHFVNIPQYGVRFFEFAVSLLSDKLKDRVMFHRACDDLTKHVDPAILPKEYGGTVPLKDMIEELKRNLLKHREELLALDDLSIDLYALEKNDLTQDIHSTAGSFRKLELD
AC C
EP
TE D
M AN U
SC
QDLIDHWKMKVETYSDWYAQHDSIRPDESKSSDAKNSANVLGIVNCFKKLDID