Accepted Manuscript Molecular evolution and expression of the CRAL_TRIO protein family in insects Gilbert Smith, Adriana D. Briscoe PII:

S0965-1748(15)00026-0

DOI:

10.1016/j.ibmb.2015.02.003

Reference:

IB 2669

To appear in:

Insect Biochemistry and Molecular Biology

Received Date: 1 December 2014 Revised Date:

29 January 2015

Accepted Date: 3 February 2015

Please cite this article as: Smith, G., Briscoe, A.D., Molecular evolution and expression of the CRAL_TRIO protein family in insects, Insect Biochemistry and Molecular Biology (2015), doi: 10.1016/ j.ibmb.2015.02.003. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

908 930 4 004 0062979 XP x 2 Bm s Mse S201 97 2 M OG 317 094 4 8 DP P Dp Ag X FA0 378 5 E 2 Tc XP 6 2327 39 7 0 Am NP 310 9076 0 4 Dm g XP 493 013 19 A 00 S2 91 60 0 8 3 XP OG P 6 59 82 5 4 Bm DP m N P 62 Dp D g X XP A Am

Bm XP 004922800 Bm X P 0 0 4 9 2 2 7 6 2 Ms Msex005131a Ms Msex00 5131b Bm XP Dp DP 004922763 Dp DP OGS212706 O G Ms M S212 Bm sex005 705 Bm XP 004 127 Ms XP 004 922758 M Dp sex 9227 Dp DPO 0051 57 Ms DPO GS2 24 Bm Mse GS2 12755 Tc X x0 127 A XP P 00 1195 54 D m X 96 492 9 Ag m N P 0 7589 7864 M X P 06 B s P 65 565 D m Ms 31 188 388 Tc p B XP ex0 802 2 XP F0 00 019 7 1 4 00 01 928 52 18 6B 97 10 2F 1 29 09 0

ACCEPTED MANUSCRIPT Ms Manduca sexta Dp Danaus plexippus Bm Bombyx mori

64 02 09 1 31 18 65 93 7 02 06 P 9 X 1 2 29 6 g P 3 S A c X XP OG 059 883 4 T g DP ex0 492 63 A p s 0 29 D s M P 0 049 860 M m X P 0 000 B m X sex 311 B s M J75 460 M EH 969 77 Dp XP 3120 13 Tc XP 5723 61 Ag NP 0008 Dm Msex 00862 5 Ms Msex0 92963 Ms XP 004 2 Bm HJ7531 2928 Dp E P 00492 6 Bm X P 00492963 Bm X J70837 Dp EH 639 29 49 00 Bm XP 63 Ms Msex0008 Tc XP 966309 Tc XP 008197657 Tc XP 971470 Tc XP 971527 Tc XP 970129 Dp DPOG Bm XP S206990 Ms Mse 004928861 Bm X x010502 Dp D P 00493 Ms POGS2 3930 Bm Msex0 06941 Bm XP 00 10499 Bm XP 0 49313 Ms XP 0493 68 Bm Ms 0049 1367 Ms XP ex004 31422 51 M M 00 M s M sex 4928 9 B s M sex 010 857 Ms m X sex 010 488 D M P 0 012 487 s 0 D p 1 D p EH ex 49 01 D p D EH J79 010 2905 p P J7 2 49 EH O 9 91 2 4 J7 GS 292 Lepidopteran 92 20 89 69 expansion 88

SC

M AN U TE D EP AC C

7 2 49 05 10 29 x0 49 498 se 0 0 96 63 M 0 01 4 s XP x 10 88 M m Mse ex0 492 89 s 0 04 B s M s M P 0 01 495 M m X sex 010 B s M sex M sM M

Bootstrap support >80%

Insect CRAL_TRIO domain proteins

Dp DP Dp D O Bm DP p GS OG EH 214 X J Dp Ms MP 00 S2 72 230 9 Dp DP se 492 142 49 D OG x0 9 29 Ms POG S2 038 378 Bm M S 14 04 XP sex 214 232 0 2 Dm 0049 038 33 0 Ag ABV 2924 5 8 8 X Ag P 32 2392 Dm XP 31 0369 8 NP Dm 6 052 NP 1051 Tc X 72481 0 Dp E P 97310 3 HJ 6 Dp E 78510 Bm XP HJ78511 0049 Ms com 29247 p5 Bm XP 0049 6142 29246 Ms scaffold00053 Bm XP 004929244 Tc XP 968368 Dm NP 730931 Ag XP 003436154 5961 Ms Msex00 929 4928 Bm XP 00HJ74795 E 8 Dp 56332 6 P 006 Am X EHJ790832 Dp P 9732 8 X Tc 031 P 97 355 Tc X 01040 5959 0 0 0 P e N 084 Bm Ms Ms HJ79 085 E 9 0 Dp EHJ7 0596 1 3 Dp sex0 289 9 M 4 Ms P 00 X m B

Clathrin box

T Dp c X Bm D P P 0 XP OG 08 M Bm s S 1 M 00 2 93 XP se 49 013 74 x0 30 4 1 Dp Ms 0 0 M 4 0 90 5 Tc DPO sex 93 629 6 Am XP GS 014 091 5 1 0 2 Dm XP 0 081 013 871 93 4 0 N Ms P 00 6566 2177 4 1 M Bm se 260 55 Dp XP 00 x0107 581 DPO 49 0 GS 2562 2 Ms 2 5 M sex 15399 Bm 01 X Dp D P 00492 4334 POG 5626 S Dp D 2000 PO 2 Bm XP GS21539 2 8 00 Ms Mse 4925639 x01070 Tc XP 9723 1 45 Ag XP 317327 Bm XP 004931769 Dp DPOGS207391 Ms Msex002032 53 Bm XP 004922581 23 Ms Msex0163497 13 S2 Dp DPOG 87951 0016 Ag XP P 648579 Dm N 300003 PSCFx018304 D p D se 557 Ms M XP 315 939 Ag P 523 155 N 6 Dm GS20 9842 DPO 492 326 Dp XP 00 x010 312 0 e Bm s Ms 0325 49212 M P0 97 8 X XP 209 42 Am Tc XP 3 6510 001 0 8 Ag NP F30 892 2 6 A C 2 PD PS 049 059 250 9 Dm p D P 0 ex0 931 703 08 D X s 4 0 3 PINTA 0 M 4 Bm Ms P 0 GS2 00 X O ex P s Bm D M p s D M

RI PT

Tc Tribolium castaneum Am Apis mellifera Ag Anopheles gambiae Dm Drosophila melanogaster

ACCEPTED MANUSCRIPT

1

Molecular evolution and expression of the CRAL_TRIO protein family in insects

2 3

Gilbert Smith1* and Adriana D. Briscoe1

4

1

5

*Corresponding author

RI PT

Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697

6 7

SC

8 9

M AN U

10 11 12 13 14

18 19 20 21 22 23 24 25

EP

17

AC C

16

TE D

15

Keywords: Alpha-tocopherol transfer protein; retinoid-binding protein; CRAL_TRIO; SEC14; gene duplication

26 27

1

ACCEPTED MANUSCRIPT

1

Abstract CRAL_TRIO domain proteins are known to bind small lipophilic molecules such as

3

retinal, inositol and Vitamin E and include such gene family members as PINTA, α-tocopherol

4

transfer (ATT) proteins, retinoid binding proteins, and clavesins. In insects, very little is known

5

about either the molecular evolution of this family of proteins or their ligand specificity. Here we

6

characterize insect CRAL_TRIO domain proteins and present the first insect CRAL_TRIO

7

protein phylogeny constructed by performing reciprocal BLAST searches of the reference

8

genomes of Drosophila melanogaster, Anopheles gambiae, Apis mellifera, Tribolium castaneum,

9

Bombyx mori, Manduca sexta and Danaus plexippus. We find several highly conserved amino

10

acid residues in the CRAL_TRIO domain-containing genes across insects, and a gene expansion

11

resulting in more than twice as many gene family members in lepidopterans than other surveyed

12

insect species, but no lepidopteran homolog of the PINTA gene in Drosophila. In addition, we

13

examined the expression pattern of CRAL_TRIO domain genes in M. sexta heads using RNA-

14

Seq data. Of the 42 gene family members found in the M. sexta reference genome, we found 30

15

expressed in the head tissue with similar expression profiles between males and females. Our

16

results suggest this gene family underwent a large expansion in Lepidoptera, making the

17

leptidopteran CRAL_TRIO domain family distinct from other holometabolous insect lineages.

SC

M AN U

TE D

EP

AC C

18

RI PT

2

2

ACCEPTED MANUSCRIPT

1

1. Introduction The CRAL_TRIO domain is an N-terminal ligand binding region within a larger protein

3

domain (SEC14) that takes its name from common sequence elements in cellular retinaldehyde

4

binding protein (CRALBP) and the signaling domain of the TRIO guanine exchange factor

5

(Panagabko et al., 2003). The domain was first characterized three-dimensionally from structures

6

of yeast phosphatidylinositol transfer protein SEC14p (Sha et al., 1998) and human supernatant

7

protein factor (SPF; Stocker et al., 2002) but is present in most organisms and is likely to be of

8

ancient evolutionary origin (Saito et al., 2007). The domain structure is comprised of alternating

9

α-helices and β-strands (Fig. 1) that bind small lipophilic molecules such as retinal, inositol,

10

squalene, and Vitamin E, the latter of which is composed of α-, β-, γ-, and δ-tocopherols

11

(Panagabko et al., 2003). CRAL_TRIO domains are found alongside many other domains in

12

proteins, including the Golgi dynamics (GOLD), protein tyrosine phosphatase (PTP), RasGAP,

13

RhoGAP, RhoGEF, pleckstrin homology (PH), macro, spectrin repeats (SPEC), Src homology

14

(SH3) and protein kinase domains (Gupta et al.,2012; Saito et al., 2007).

TE D

M AN U

SC

RI PT

2

Relatively few members of the large CRAL_TRIO gene family have been studied for

16

ligand specificity. Those that have been studied include human α-tocopherol transfer protein (α-

17

TTP), SPF (which binds squalene), cellular retinaldehyde binding protein (CRALBP, which

18

binds retinal) and Saccharomyces cerevisiae phosphatidylinositol transfer protein (SEC14p,

19

which binds phosphatidylinositol and phosphatidylcholine). Ligand specificity of these related

20

proteins has been compared in one study that sought to explain why mammals are able to

21

selectively reuptake α-tocopherol regardless of the mixture of dietary tocopherols ingested

22

(Panagabko et al., 2003). In humans, mutations in α-TTP can result in a neurodegenerative

23

disease called ataxia with isolated vitamin E deficiency (AVED) that causes a variety of

AC C

EP

15

3

ACCEPTED MANUSCRIPT

neurological symptoms (Ouahchi et al., 1995). While, of the four proteins compared, α-TTP

2

exhibited the highest binding affinity for α-tocopherol, promiscuity in ligand binding was

3

observed in the other three proteins, suggesting that caution should be exercised when attempting

4

to infer function of these proteins from single ligand studies. The SEC14 domain contains a large

5

flexible lipid binding hydrophobic pocket that is closed off by the N-terminal domain, and

6

docking studies suggest different lipids may bind SEC14 in several different ways (Saito et al.,

7

2007). Few studies have identified key residues that interact with particular ligands. However,

8

structural and mutational analysis of yeast SEC14 suggests that residues E207 and K239 are

9

critical for phosphatidylinositol head group binding (Sha et al., 1998), and AVED causing

10

mutations in human α-TTP are located in the binding pocket (Bromley et al., 2013; Min et al.,

11

2003).

M AN U

SC

RI PT

1

Besides α-tocopherol binding and transport by α-TTP, and phosphatidylinositol/

13

phosphatidylcholine exchange and transfer observed in SEC14p, it has been proposed that other

14

gene family members are involved in lipid-mediated regulatory functions in organelles and in

15

intracellular traffic (Saito et al., 2007). For example, members of the clavesin (clathrin vesicle-

16

associated SEC14 proteins; Katoh et al., 2009) gene family are found abundantly in mammalian

17

brain tissue on clathrin-coated vesicles that originate from the trans-Golgi network and form the

18

major protein transport pathway from the secretory system to endosomes/lysosomes (Katoh et

19

al., 2009). Experimental work by Katoh et al. (2009) indicates that clavesins specifically function

20

in neurons in a transport pathway between early endosomes and mature lysosomes and that one

21

of the molecules they bind is phosphatidylinositol 3,5-bisphosphate (PtdIns(3,5)P2). Mutations in

22

mice that deplete PtdIns(3,5)P2 produce neurodegeneration phenotypes with little effect on other

23

tissues (Zhang et al., 2007).

AC C

EP

TE D

12

4

ACCEPTED MANUSCRIPT

In insects, the function of only one member of the CRAL_TRIO domain protein, PINTA,

2

has been determined. PINTA or prolonged depolarization afterpotential (PDA) is not apparent

3

was discovered in a screen for Drosophila eye mutants that are deficient in rhodopsin biogenesis

4

(Wang and Montell, 2005). Visual pigments or rhodopsins are composed of an opsin protein

5

covalently linked to derivatives of Vitamin A (all-trans-retinol), such as 11-cis-retinal. Although

6

the cells that compose the vertebrate and invertebrate eye differ significantly, the function of

7

PINTA is somewhat reminiscent of human CRALBP, which is involved in the transport of the

8

visual pigment chromophore, 11-cis-retinal, in the photoreceptors of the human eye (Crabb et al.,

9

1988). Humans that carry mutations in the CRALBP gene may develop retinitis pigmentosa or

10

retinitis punctata albescence (Fishman et al., 2004). In Drosophila, PINTA is localized to the

11

retinal pigment cells, where all-trans 3-hydroxyretinol is converted to 11-cis 3-hydroxyretinal

12

(Pak et al., 2012; Wang and Montell, 2005). Ligand-binding assays indicate that PINTA

13

preferentially binds all-trans-retinol (Wang and Montell, 2005). It has been proposed that PINTA

14

either sequesters the 3-hydroxy version of all-trans-retinol in retinal pigment cells, generating a

15

concentration gradient that promotes uptake of Vitamin A in these cells, and/or that PINTA

16

facilitates the presentation of all-trans-retinol to proteins that participate in the oxidation of all-

17

trans-retinol to all-trans-retinal (Pak et al., 2012; Wang and Montell, 2005).

SC

M AN U

TE D

EP

In the course of annotating vision-related proteins of the hawkmoth (Manduca sexta)

AC C

18

RI PT

1

19

genome, we discovered that lepidopteran genomes appear to contain a large number of

20

CRAL_TRIO domain proteins. While many of these gene family members are given names in

21

GenBank such as 'α-tocopherol transfer (ATT) protein', 'clavesin' or 'retinaldehyde binding

22

protein', in fact nothing is known experimentally about the function of any of these proteins in

23

insects, except for PINTA. To help begin to bridge this gap in our knowledge, here we

5

ACCEPTED MANUSCRIPT

characterize for the first time the CRAL_TRIO domain in insects and its molecular phylogeny.

2

Specifically, we determine phylogenetic relationships between gene family members from

3

insects with reference genomes, including the newly sequenced genome of M. sexta. We find a

4

number of conserved sites within the SEC14 domain. Further, we see large variation in the size

5

of the gene family, with lepidopterans (butterflies and moths) having the highest number of gene

6

family members among the species examined. We also make use of RNA-Seq data to examine

7

the expression of gene family members in M. sexta.

8

2. Materials and methods

M AN U

SC

RI PT

1

We searched the predicted proteins of the reference genomes of M. sexta and Danaus

10

plexippus for all homologs of the CRAL_TRIO domain family using tblastn and query sequences

11

from Drosophila melanogaster, Anopheles gambiae, Apis mellifera, Tribolium castaneum and

12

Bombyx mori homologs, as well as reciprocal tblastn searches of the nr/nt or RefSeq RNA

13

databases (top hits with e-values of < 1 × 10-10). We then performed tblastn searches of a de novo

14

assembly of the M. sexta transcriptome (ArrayExpress accession number E-MTAB-2066; Smith

15

et al., 2014) to find mRNA contigs that were identical to the predicted peptides. Comparison of

16

predicted peptide sequences from the reference genome against the predicted peptide mRNA

17

sequences permitted us to identify and correct errors in the predicted protein sequences. Isoforms

18

found in GenBank were excluded from further analysis, with the first annotated transcript

19

selected for analysis. The identity of the CRAL_TRIO domain-containing proteins was

20

confirmed through a comparison of their SEC14 domains to the Pfam database of HMM-based

21

gene family models using HMMER (Finn et al., 2011). Amino acid sequences were aligned

22

using MUSCLE with default settings. Neighbor-joining analysis of 455 amino acid sites was

AC C

EP

TE D

9

6

ACCEPTED MANUSCRIPT

1

performed using the Poisson model and pairwise deletion in MEGA (Tamura et al., 2011).

2

Robustness of the phylogeny was tested using 500 bootstrap replicates.

3 2.1. 3D modeling and sequence logo of the insect CRAL_TRIO/SEC14 domain

RI PT

4

The three-dimensional protein structure of the SEC14 domain (containing the

6

CRAL_TRIO domain) was predicted using I-TASSER (Roy et al., 2010), which uses both

7

multiple threading alignments and ab initio modeling with further refinement to obtain the most

8

likely 3D structure. Several shorter sequences were omitted from this analysis in order to

9

minimize gaps in the alignment. Thus the 3D structure was predicted from the consensus

10

sequence of 159 insect SEC14 domains, producing a top model with a confidence score (C-

11

score) of 0.87. The three dimensional structure was visualized in Jmol (Jmol Team, 2002) and

12

secondary structures were predicted using Jalview (Waterhouse et al., 2009) from the alignment.

13

A sequence logo was created in WebLogo (Crooks et al., 2004) to examine conservation of the

14

SEC14 domain across insects. Highly conserved amino acid positions were determined as those

15

with a bit score of > 2.5 with a frequency of > 70% at a particular position.

16

2.2. RNA-Seq Analysis

EP

TE D

M AN U

SC

5

Bar plots were made using the R package ggplot2 (Wickham, 2009) to visualize the

18

expression levels of members of the CRAL_TRIO domain gene family in M. sexta. This data

19

was obtained from Smith et al. (2014) and consisted of mRNA extracted from the heads of four

20

male and four female M. sexta adults sequenced on the Illumina platform (50 bp single-end

21

reads), followed by de novo transcriptome assembly using Trinity (Grabherr et al., 2011) and

22

RSEM (Li and Dewey, 2011) to assemble contigs that represent mRNA transcripts and obtain

23

expression levels. Raw count levels were normalized by transcript length using the FPKM

AC C

17

7

ACCEPTED MANUSCRIPT

(Fragments per kilobase of exon per million fragments mapped). FPKM was further normalized

2

between libraries in the R package NOIseq (Tarazona et al., 2011), using the trimmed mean of

3

M-values (TMM) normalization method. Differences in gene expression between males and

4

females were tested using ANOVA, and p-values were corrected for multiple tests using a False

5

Discovery Rate (FDR) calculated in the R package qvalue (Dabney and Storey, 2014). Genes

6

were defined as significantly differentially expressed when the FDR was < 0.05.

7

3. Results and discussion

SC

RI PT

1

Our GenBank searches yielded 43 predicted proteins in the CRAL_TRIO domain family

9

from B. mori, 12 from D. melanogaster, 14 from A. gambiae and 18 from T. castaneum. We also

10

found 6 from A. mellifera, although this number may have been underestimated due to the poor

11

initial annotation of this genome (Elsik et al., 2014). Our searches of whole-genome sequences

12

yielded 42 predicted CRAL_TRIO domain proteins in the Manduca sexta genome (Table S1)

13

and 38 in the Danaus plexippus genome. A variety of isoforms were also evident in each species.

14

A comparison of the blast-identified CRAL_TRIO proteins to existing Pfam HMM protein

15

family models showed the top hit for each to be the CRAL_TRIO domain family (Pfam

16

accession PF00650). All but four proteins had top hit e-values of < 1 × 10-10, the remaining four

17

having e-values of < 1 × 10-5. M. sexta e-values of matches to the CRAL_TRIO domain family

18

are shown in Table S1.

TE D

EP

AC C

19

M AN U

8

The insect SEC14 consensus protein structure prediction included four β-strands (Fig. 1A

20

and B), compared to six in the yeast phosphatidylinositol transfer protein (Sha et al., 1998) and

21

five in the human supernatant protein factor (Stocker et al., 2002). The overall insect SEC14

22

structure resembled that of human and yeast SEC14 domain structures, which includes a deep

23

hydrophobic pocket with the β-strands making up the floor of the pocket. Sequence logo analysis 8

ACCEPTED MANUSCRIPT

of the SEC14 domain, in which the CRAL_TRIO_N domain is embedded (Fig. 1C), indicates

2

the presence of 18 highly conserved amino acid residues 12 of which are conserved compared to

3

yeast SEC14p and 6 of which appear to be conserved across insects (Fig. 1C). The majority of

4

these conserved residues (12 out of 18) were associated with secondary structures (α-helices or

5

β-strands). Gupta et al. (2012) surveyed CRAL_TRIO domains across a broad range of species,

6

including yeast, plants, fish and humans, yet only one residue (position D178 in yeast SEC14p)

7

is conserved in both their study and here, providing further evidence that the six conserved

8

residues found here could be insect specific.

SC

RI PT

1

Phylogenetic analysis of the gene family (excluding isoforms) suggests numerous

10

lineage-specific gene duplications in insects but especially in lepidopterans (Fig. 2). Many gene

11

duplicates are in adjacent positions on the same genomic scaffold suggesting they are tandem

12

arrays of gene duplicates (e.g., M. sexta genes Msex005959-Msex005962). In particular, nine

13

genes from the lepidopteran expansion are tandemly arrayed on a single scaffold in M. sexta

14

(Msex010487-Msex010499), of which four were expressed in the head tissues (Table S1).

15

Similar patterns were seen for B. mori and D. plexippus genes within the lepidopteran expansion,

16

and duplication patterns suggest independent expansion events in each species (Fig. 2). This

17

suggests an important role for gene duplication during the evolution of lepidopteran

18

CRAL_TRIO proteins. Orthologs of the PINTA gene in Drosophila were not found in any of the

19

three lepidopteran genomes searched. Additionally, we found evidence of mRNA expression in

20

M. sexta heads for 30 CRAL_TRIO domain genes (Fig. 3). The majority of these genes had

21

appreciable levels of expression and one in particular, comp17579_c0 (Msex0051227; Table

22

S1), was expressed at a very high level. Few differences in the expression of CRAL_TRIO

23

domain

AC C

EP

TE D

M AN U

9

genes

were

seen

between

males

9

and

females.

Only

a

single

gene

ACCEPTED MANUSCRIPT

1

(comp20575_c0/Msex014871) showed significant differential expression after FDR correction,

2

having a higher expression level in males (Fig. 3). Clavesins, which are predominantly associated with clathrin-coated vesicles in

4

mammalian brain tissue, contain a clathrin box motif, LLALD, which is putatively involved in

5

clathrin binding (Katoh et al. 2009). We found this motif, which is the result of a four amino acid

6

insertion in the C-terminal domain, to be conserved in three orthologous lepidopteran proteins:

7

M. sexta Msex016323, B. mori XP_004922581 and D. plexippus DPOGS213497 (Fig. 2).

8

Msex016323 (comp35578_c0) is also one of the 30 CRAL_TRIO domain genes expressed in

9

heads, though at relatively low levels compared to other CRAL_TRIO genes (Fig. 3). We did not

10

find this insertion and precise motif in any of the lepidopteran proteins found in GenBank named

11

clavesin-1-like or clavesin-2-like except for B. mori XP_004929842 and Msex010326

12

(comp20338_c0), which contained a related motif, VLALD/N, in a similar position in the

13

protein. Unlike comp35578_c0 (Msex016323), comp20338_c0 is expressed at appreciable levels

14

in M. sexa heads.

TE D

M AN U

SC

RI PT

3

It is notable that we did not find an ortholog of the PINTA gene in lepidopterans (Fig. 2).

16

Currently, the identities of the retinoid binding proteins (RBPs) required for transformation of

17

vitamin A to the chromophore remain unclear in most insects. In Drosophila, all-trans 3-

18

hydroxyretinal can be converted into 11-cis 3-hydroxyretinal by light but it is also the case that

19

11-cis 3-hydroxyretinal can be made from dietary Vitamin A. The fly chromophore does not

20

rapidly dissociate from the opsin after light excitation, whereas in butterflies it can (Bernard,

21

1983a; 1983b; Stavenga and Hardie, 2011). This suggests that in lepidopteran species,

22

production of the chromophore may be even more dependent on enzymatic processes than in

23

Drosophila. It is therefore possible that some of the newly-identified CRAL_TRIO domain

AC C

EP

15

10

ACCEPTED MANUSCRIPT

1

proteins identified here have taken on that role as retinoid-binding proteins involved in the visual

2

cycle. We surveyed insect proteins containing the lipid binding CRAL_TRIO domain, including

4

the newly sequenced M. sexta genome. We found that the CRAL_TRIO domain gene family has

5

undergone a considerable expansion in lepidopteran species and that a number of amino acid

6

residues within the wider SEC14 domain appear to demonstrate insect-specific conservation.

7

Further, the majority of these genes are expressed in the head tissue in M. sexta. Our results

8

suggest that the evolution of CRAL_TRIO domain-containing proteins might play an important

9

role in insect evolution and in particular in the evolution of lepidopteran species, which is

SC

M AN U

10

RI PT

3

potentially linked to the evolution of visual systems.

11 Acknowledgements

13

We would like to thank two anonymous reviewers for their constructive comments. This project

14

was funded by a National Science Foundation (NSF) grant: IOS-1257627.

EP AC C

15

TE D

12

11

ACCEPTED MANUSCRIPT

References

2

Bernard, G.D., 1983a. Bleaching of rhabdoms in eyes of intact butterflies. Science. 219, 69-71.

3

Bernard, G.D., 1983b. Dark-processes following photoconversion of butterfly rhodopsins.

4

RI PT

1

Biophys. Struct. and Mech. 9, 277-286.

Bromley, D., Anderson, P.C., Daggett, V., 2013. Structural consequences of mutations to the α-

6

tocopherol transfer protein associated with the neurodegenerative disease ataxia with

7

vitamin E deficiency. Biochemistry 52, 4264-4273.

SC

5

Crabb, J.W., Goldflam, S., Harris, S.E., Saari, J.C., 1988. Cloning of the cDNAs encoding the

9

cellular retinaldehyde-binding protein from bovine and human retina and comparison of

11 12 13 14

the protein structures. J. Biol. Chem. 263, 18688-18692.

Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E., 2004. WebLogo: a sequence logo generator. Genome Res. 14, 1188-1190.

Dabney, A., Storey, J.D., 2014. qvalue: Q-value estimation for false discovery rate control. R

TE D

10

M AN U

8

package version 1.36.0.

Elsik, C.G., Worley, K.C., Bennett, A.K., Beye, M., Camara, F., Childers, C.P., de Graaf, D.C.,

16

Debyser, G., Deng, J.X., Devreese, B., Elhaik, E., Evans, J.D., Foster, L.J., Graur, D.,

17

Guigo, R., Hoff, K.J., Holder, M.E., Hudson, M.E., Hunt, G.J., Jiang, H.Y., Joshi, V.,

19 20

AC C

18

EP

15

Khetani, R.S., Kosarev, P., Kovar, C.L., Ma, J., Maleszka, R., Moritz, R.F.A., MunozTorres, M.C., Murphy, T.D., Muzny, D.M., Newsham, I.F., Reese, J.T., Robertson, H.M., Robinson, G.E., Rueppell, O., Solovyev, V., Stanke, M., Stolle, E., Tsuruda, J.M., Van

21

Vaerenbergh, M., Waterhouse, R.M., Weaver, D.B., Whitfield, C.W., Wu, Y.Q.,

22

Zdobnov, E.M., Zhang, L., Zhu, D.H., Gibbs, R.A., HGSC Production Teams, Honey

12

ACCEPTED MANUSCRIPT

1

Bee Genome Sequencing Consortium, 2014. Finding the missing honey bee genes:

2

lessons learned from a genome upgrade. BMC Genomics 15, 86.

4

Finn, R. D., Clements, J., Eddy S. R., 2011. HMMER web server: interactive sequence similarity

RI PT

3

searching. Nucleic Acids Res. 39(Web Server issue): W29-W37.

Fishman, G.A., Roberts, M.F., Derlacki, D.J., Gimsby, J.L., Yamamoto, H., Sharon, D.,

6

Nishiguchi, K.M., Dryja, T.P., 2004. Novel mutations in the cellular retinaldehyde-

7

binding protein gene (RLBP1) associated with retinitis punctata albescens: evidence of

8

interfamilial genetic heterogeneity and fundus changes in heterozygotes. Arch.

9

Ophthalmol. 122, 70-75.

M AN U

SC

5

Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X.,

11

Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A.,

12

Rhind, N., di Palma, F., Birren, B. W., Nusbaum, C., Lindblad-Toh, K., Friedman, N.,

13

Regev, A., 2011. Full-length transcriptome assembly from RNA-Seq data without a

14

reference genome. Nat. Biotechnol. 29, 644-652.

17 18 19 20 21

http://www.jmol.org/.

EP

16

Jmol Team, 2002. Jmol: an open-source Java viewer for chemical structures in 3D.

Katoh, Y., Ritter, B., Gaffry, T., Blondeau, F., Hönig, S., McPherson, P.S., 2009. The clavesin

AC C

15

TE D

10

family, neuron-specific lipid- and clathrin-binding Sec14 proteins regulating lysosomal morphology. J. Biol. Chem. 284, 276464-27654.

Li, B., Dewey, C.N., 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 12, 323.

13

ACCEPTED MANUSCRIPT

1

Min, K.C., Kovall, R.A., Hendrickson, W.A., 2003. Crystal structure, of human alpha-tocopherol

2

transfer protein bound to its ligand: Implications for ataxia with vitamin E deficiency. P.

3

Natl. Acad. Sci. USA 100, 14713-14718. Ouahchi, K., Arita, M., Kayden, H., Hentati, F., Ben Hamida, M., Sokol, R., Arai, H., Inoue, K.,

5

Mandel, J.L., Koenig, M., 1995. Ataxia with isolated vitamin E deficiency is caused by

6

mutations in the alpha-tocopherol transfer protein. Nat. Genet. 9, 141-145.

RI PT

4

Pak, W.L., Shino, S., Leung, H.T., 2012. PDA (prolonged depolarization afterpotential)-

8

defective mutants: the story of nina's and ina's--pinta and santa maria, too. J.

9

Neurogenet. 26, 216-237.

M AN U

SC

7

10

Panagabko, C., Morley, S., Hernandez, M., Cassolato, P., Gordon, H., Parsons, R., Manor, D.,

11

Atkinson, J., 2003. Ligand specificity in the CRAL-TRIO protein family. Biochemistry

12

42, 6467-6474.

16 17 18 19 20 21

TE D

15

structure and function prediction. Nat. Protoc. 5, 725-738. Saito, K., Tautz, L., Mustelin, T., 2007. The lipid-binding SEC14 domain. Biochim. Biophys. Acta. 1171, 719-726.

EP

14

Roy, A., Kucukural, A., Zhang, Y., 2010. I-TASSER: a unified platform for automated protein

Sha, B., Phillips, S.E., Bankaitis, V.A., Luo, M., 1998. Crystal structure of the Saccharomyces cervisiae phosphatidylinositol-transfer protein. Nature. 391: 506-510.

AC C

13

Smith, G., Chen, Y.R., Blissard, G.W., Briscoe, A.D., 2014. Complete dosage compensation and sex-biased gene expression in the moth Manduca sexta. Genome Biol. Evol. 6, 526-537.

Stavenga, D.G., Hardie, R.C., 2011. Metarhodopsin control by arrestin, light-filtering screening

22

pigments, and visual pigment turnover in invertebrate microvillar photoreceptors. J.

23

Comp. Physiol. A. 197, 227-241.

14

ACCEPTED MANUSCRIPT

1 2

Stocker, A., Tomizaki, T., Schulze-Briese, C., Baumann, U., 2002. Crystal structure of the human supernatant protein factor. Structure 10, 1533-1540. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5:

4

molecular evolutionary genetics analysis using maximum likelihood, evolutionary

5

distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739.

7

Tarazona, S., Garcia-Alcalde, F., Dopazo, J., Ferrer, A., Conesa, A., 2011. Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213-2223.

SC

6

RI PT

3

Voolstra, O., Oberhauser, V., Sumser, E., Meyer, N.E., Maguire, M.E., Huber, A., von Lintig, J.,

9

2010. NinaB is essential for Drosophila vision but induces retinal degeneration in opsin-

10 11 12

M AN U

8

deficient photoreceptors. J. Biol. Chem. 285, 2130-2139.

Wang, T., Montell, C., 2005. Rhodopsin formation in Drosophila is dependent on the PINTA retinoid-binding protein. J. Neuroscience. 25, 5187-5194.

Waterhouse, A.M., Procter, J.B., Martin, D.M.A., Clamp, M., Barton, G.J., 2009. Jalview

14

Version 2-a multiple sequence alignment editor and analysis workbench. Bioinformatics

15

25, 1189-1191.

TE D

13

Wickham, H., 2009. ggplot2: elegant graphics for data analysis. New York, Springer.

17

Zhang, Y., Zolov, S.N., Chow, C.Y., Slutsky, S.G., Richardson, S.C., Piper, R.C., Yang, B., Nau,

18

J.J., Westrick, R.J., Morrison, S.J., Meisler, M.H., Weisman, L.S., 2007. Loss of Vac14, a

20 21

AC C

19

EP

16

regulator of the signaling lipid phosphatidylinositol 3,5-bisphosphate, results in neurodegeneration in mice. Proc. Natl. Acad. Sci. USA 104, 17518-17523.

15

ACCEPTED MANUSCRIPT

Fig. 1. Protein structure and conservation of the SEC14 domain in insects. The three-dimensional protein structure was predicted using I-TASSER and visualized in Jmol from two different viewpoints (a & b) using the consensus sequence of 159 insect sequences (α-helices shown in pink, β-strands in yellow and 310 helices purple). The sequence logo (c) was created in WebLogo from 159 insect sequences and includes the approximate positions of α-helices (blue bars) and βstrands (red arrows) numbered from the N- to C-terminus, the majority of which are labeled in a & b. The height of each stack is proportional to the sequence conservation, measured in bits, and the height of each letter is proportional to the frequency of residues at that position. The position of the CRAL_TRIO_N domain is shown (dashed red line). Black arrows denote highly conserved amino acid positions with a bit score of > 2.5 and frequency of > 70% at that site. Numbers above arrows show the position of that amino acid within the yeast SEC14p protein (NCBI accession: NP_013796). Red stars indicate conserved sites between the insect consensus and yeast SEC14p sequences.

RI PT

2 3 4 5 6 7 8 9 10 11 12 13 14

Figure and table legends

SC

1

16 17 18

M AN U

15

Fig. 2. Phylogeny of the insect CRAL_TRIO domain proteins. The neighbor-joining tree was constructed in MEGA with 500 bootstrap replicates. Bootstrap support values of > 80% are shown as dots on branches.

19

27 28 29 30 31 32

TE D

EP

26

Fig. 3. Expression levels of members of the CRAL_TRIO domain gene family in the head tissue of Manduca sexta. Expression level is the mean normalized FPKM for female (red) and male (blue) RNA sequencing libraries. Error bars are the standard errors of mean female and male expression across four biological replicates of each sex. Contig IDs are from the Trinity assembly. Significant sex differences are denoted by asterisks, with significance defined as a FDR of < 0.05.

Table S1. Top BLAST hits of M. sexta genome annotations, and corresponding de novo assembled mRNA transcripts, to B. mori and D. plexippus. Table includes the BLAST hit description, e-value, percent identity, accession and the top-hit e-value from the HMMER search of the Pfam database.

AC C

20 21 22 23 24 25

33 34

16

A

B ACCEPTED MANUSCRIPT α7" α3"

α10"

α1"

α9"

α6" α9"

α3" α8"

N

α2"

β1" β2" β3" β4"

β2" β1"

RI PT

α1"

N

α5"

SC

α11"

43

58*

4

A

VDQ A

E

M D Q

G Q

K K

1 0 3 3

A

E

G Q

FI Y A

TD

AS

V

Q

Y

NL P

D

V

A

V

Q

V

LD

SR

E

E F M

E Q I

K

N

Q

S

T

M

M

A R

Q

K N

D

K

T

A

NA

L

S

P R

T

T

S

C

L

F

V

T

A

L

M

G G

RG

R L RK L R K L R APRK

TL

A A K I I

G

E

V

α3"

T

D

S

C

I

P

I

ER

ID

R

T M

A

A K

C

M

A K

L

V H

G

F

V

Q

E

S

V

P

K

W

D A

E

F LSN F

DW

P

S A K

E N

G

R

Y

S

Y

S

C

T

M

VV

I

R M

Q

P N R

H

TS

F Y

A

A K

H

V

C

T M

I

V

PS Y

P

T

A

AL

L

SA

N Y F H

E

Q

T C

R

M

N

V

V

I

F

R

L

A

LFF VA

LS

I

C

M

T

Y

FV

AY

H

S

S

T C

A

I

LV

G

M

T

T

C

I

C

K

QV

N A D A Y

I

T

C

α4"

S

E

R

I

F

F

S A

V KM L I KLH L MN

L

V

E K K N G

Q

V

Y

C

M

V

A I N M SY

A

E

A

S

N F H

T

N

Y EL A I

L

D

A

L

A L

E

Q

R

VV

L

T

H

C

P

T

AL

SA

E K K N G

K

D

YYT

S

D

P

L

Y

V E MA

C

Y

A

S

Q

Q

TKN R LC

F

I

N S

K

HR

LQT Y DS ER NPQ S I C

H

S

I

S VT R

A

H

I

AR

SSD

V E E

E K F T

N

I

H

T

A

M

TL

A A K I

N

TY K AK A

N

I

A

QG

F

S

M

F

LT

L R

W

D

EM

F

A

K

A

VL N F

I Y

C

M

F

P S

R

L D

NA

Q

P

H

L

F

D T

L

V

T Y

C H

N

R

Y

D

I

QV

N A D A Y

I

T

Q

S

R

E

H N

L

A

Q

E

N

V

Y

S H E V

S E G

N

V

E

K

H PA S N G C

Q

A A R

D

F

Q

N L C

D

Q

P N KK

S

K

Y

T

K

YYT

V

K

F V YSC

V E MA

W K

H

A

Q

QQ

H

TS

L

LQT

S

P N R

Y

N

V

C

V H

Q

S

TKN R LC

F

Y

I

QV

N A D A Y

I

T

Q

S

F

V

L R L KK GR GDV R K A Y FFSF K HRL ENL S Y I P IH D C M L YN S I P M M C

Y

S

N

T

E

L

L

A

Q

R

N

F

LT

I

M

VV

P D A

T

Y

D

G

H

Y

N

AN

V

I

P I R T F VE

L D T

V

R

C

M

H N

R

T M

A

A K

A K

V

K

YSC

V E MA

E

α2"

R M A

Q

I

W K

H

A

Q

S

Y N

T

N

T

I

I

V

S

A

PELFD I

FAL

E PELFFD PMELFFED L I P V E

Q

A

N C M D Q Y

M L

H

E

H

K

H

Q

R

I

K

A

AL

L

SA

N

E K K N G

Y

ED S

Y

V

Q

V

Q

K N

Y

K K

H

S

M

H

V

T

S

T

V

S

P R

T

A

L

K

N

KK V

E

G

R

I

T

LD

A

E

V

D

D D

L

N

W K

H

A

C

M

A

R

V

V

K

G

Q

E

S

I

V H

S

F

N

I F

Q

Q

E

P

L

L

V

F H

E

Q

T C

R

M

N

V

V

I

YYTD R K

H A P R KK H LP EAYPGGR K H LP EAYPGGR K H ID QE

T P

T

F

N

S

C

S

V

G PI

G

Q

I

V I

I I I

AN

F AY T I

I

I VT I KT P E F P E S Q TSS

M

V

G G

T

R

G

A

M

T

Y

S H E V

C H

S E G

N

D T

V

N

L

H

R

Y

DP

N L FM

L

A

T

W

EM

G

R

Y

Y

S

T

G

T

L L L L

T

V

T

D

M

A

N

F

VA I

C

M

T

FV

AY

Y

H

S

S

T C

A

I

LV

SG

M

E

Y

H

K

T

A

A

A

R

S Q

Q

A I N M SY

D

L

T

H

L

P

Y

S

S

V

PS

C

L

RTQ

Q

P

Q

C

H

A

S

F Q E

A

N

P

S

P

N S

H

C

F

P

Y

I

S

A

A

C

I

M

E

K

I

L

H

T

D

V E E

I

S VT R

A

H

I

V

VH S

F

H

A

V

N

S A K

E N

S

C

K D

SA

H

A

F

P

E K F T

N

I

H

I

T V

F

T NWK AK A

I

A

QG

F

T

F L

TQ

VS

K

V

T

A

VL N F

I Y

S

C

M

N V

G

S

S M

V

C

M

M

F

C

R C

Y

G

E

S

A

V

S

S

T I

D

V H

Q

V

V

A

Y

H

T

Q

S

I

S

F

G K VM

A F

D

E

M

T

AY

TG C E

F

Q

S

AM

C

FV

L GVI

V

P

LA

A

Q

S

MY

F

F

A

K

Y

C

R

V

H F

L

Y

F

P S

R

T

Y

T

KKH PA Q S N G C

A A R

D

V

I

S T

K

P D A

S

Y N

A

F

P

S T

D

E

T V M

Q

R

Q

S

A

E

Y

P

Y

T M

A

M

K

V

F

I

K

T

I

E

N

Q

I

RG

LV L V AA

N

A

A

W K

H

A

A

C

E

KK

C

C

A

A

N

T

N

L

C

T

E K

L

T

H

C

S

L

D

H

S

L

V

R

P

I

S VT R

H

H

M

Y

M L

N

Y

A

A

F

Y

Q

R T

T

P

L

K

G

Y

S

H

T

T

S

S A K

R

F

S

K A D S L F A V I M L I A RV A I M MG H E P A R VDA DA H V Q F M R M K T S

E

I

A

S

F

A

N

T

C

G

W

V

R

V

T

P S

K

R

A

G

N

E

H

R

Q

T

V

L

Q

S

R

T

W

T

Y

S

N

T Y

N

L

L R

Y

D

N

KN A

P

P

C

M

S

C H

D T

N

R

E F

A

Y

V

S E G

V

A

E

F

G

M

R K

D

L D

T

Q

N

PI V T G RI HT G RVTI P I E TP HE HE E TP

A A R

P D A

N D A

S

Q

I

N

A Y

K

I

V

K

K

N

Q

M

H

G

A

E

Q

I

H T

L

E

I

F Y

RA

E

D

C

E

K

I

A

V

T

T

I

Y

L

N

N

AT

V

D LREP

Y

T

Q

L

R

R

F

A

P

R

S

Y

D LP DKD R LPH KD R H

N

T

C

N

I P S V S LT G D S G N T Y V

A

T

T

L

QA

D

F

Y

C

I

A

G

M

V

N

H

S N

I D D L SN M SS EA LE M F N P R

L

A

FT

H

E

RY Q

V

T G

D

A L N

V S

S

I

H

R

A

R

T

V

Y

ND MI N EP D IN P IN P

I

L

M S A

C

E R

L

H

Q

P

KV

P

A

V

G

YNA DSW A

L L E F

R

L

D

Y

Q

G GK GK GK N

G

G PI

C

S

R

T

D

A

F

E Q R

T

F

T

E

LK

W WC

N

Y

L

V

A

Y

AS

Q

A

S

A

R

S

T

T Y

N

E K

K

A

C

F

A

RDP

V

N

H S K

D

Y

M

R

K

S

V

N

KQ PHLP KS R K

N

R

F

L

M

L

H

A

G

A

G

I

T

V

QA

AT

V

Y

F

R

S

L

T

S

K

S

D

E

A R

C

Q

A

T

Q

I LD A V N E RG LV I FA E F VL N K VL S R T QA A KKH K N AN P AA NWK Q I TD Y KA F I PA LV AL I AA F AY K Q DE L LA D K S I V SN D TEV K T I N L TL S I SA R T V G DSRK T M N V I K TQY EM I C VK T SSQ AA E S E PS N E E Q M H Y F M P NANV A EW D Y F A N L K SG SS V E ER P N E M E M RH QG M F N I A F Y S G K H E S P QQ F N R L F DYQ V I S K V QV M AQ CT V P KN I VV R V R H L NA L LN LT F N YS A D I A N A RE M S I M R A TS R E R Y T K F R

L

A N

F

Y

I

R

T

I

FT

R

A

E

Q

K G S N V

A

T

T

D

G

E P I P IK

L TA N I SQ E K A D LKEK Q L G AV

VH S

A RWDD K L R PWS E

R

S N

D

A

F

T

Y

A

S

R KK DP QE

I PKE

DL

K

GI

RV

I

V G D S A I YN VL L A G D A I F V L AD L N F KL RG G TA A K A F S VD A AS L DS I FVM L VL F WE SL AN R TV TN Q V M AHSVF QV KL KNY I Y E F I K N YL KH AVA A WKKN A L I L L NWKF M Y E AA K R FA I P A F M A FA A AY D Q AV I I D L I F L K D A I MG S I F SN A K S I P T N L FV T A F I QV T V G M I K E C V AN T SS F R D K F LK GV A VDA E E Q DA M T M P E A EV H WE M H F Y N F VNW WD M P T N LE SG E V G R M VS QM M F N L SY E C Q T N I S F Y M P S N L F A N R Y

H S M K

Y

M E

A I

Y A H

G

L

R

R

T

I

M

S

Y

Q

V

FT

S

Y

V

M

S

L

Y

Q

SS

R A

AA

T

G

T

Q

A

T

M A

R V

D

L

ATV AW I F QV K VM V

Q

Y

M

A

G

A

Y

L

R

T

I

A

K

S

V L

A RD

G I

PQ S F G

D

A

T

V S

C

LDEDL T

MG

F Y

L

L

P

I

F Y

RA

E

N

D

G

A L N

M S A

P

T

D

P

I

T

H

Y

A

H

K

C

I

L

L

A

V

M

E

T

N S

G A

S

I

AY V AA G L E W T Q R

Q

Y

L

R

G

K

T

V S

C

D

L

M A

R

A

I

I

S

Y

D N

V

FF

L

I

E

L G

W

VY KC K VT VM

0

SS

F

R

N

A

V

AA

T

M H

E

G

Q

T

Y

Y

N

β3"

KPFL S

G

T

VN A I

FM

A

V

VL

S

K

R

T

AY

α9"

KE

T

V

G

N

R

PQ S F

M

A

S

G

Y

I

K K

V

L

A RD G

D

A

G Y

A

F

N

M

T

S

T

K

I

L

S

N

W

M

F

T

R

N

V

QV

G

T

H H

DK

A L DA W

N

W

F T

S

C

KKV

F

QED A

V

M

N

L

N

S

P

A F

R

I

Q

T

A

Y

K

A

W

K

V

T

N

I

S

T

E

E

S

R M

Q

T

I

A

ES Y

E

N

N LD G SK T S D T D I GM SE F TGLYN

KT

H

F

S D

Q

E

L

Q

K

N

N

NA

R M

M S

V

T

E

P S

E

I

I

K

K V

C P Q

R

β4"

T

Q

F

Y

K

I

V

V

L

S

S M

T

V

I

K

Q

N

M

R

V

VPK

Q A

R

S Q

I

A

A

A M

T

I

D

A

D

F

I

L GVI

K

D

D SA

S

K

F

F Q E

A D

Q

I

L

A

P

F

Q

Q

T

S

S

K

Q

S Q

N R

D

A

VPDKD

I

A

T

D

EI

P Q

K

P

T KG D A N A E E S L L R D

K

D

I

Q

S

S

E

K

DL

E A

M

N

A

A RF

V

A V

I

H VA L

L

N

S

T

C

KY

RSM

A

E

H

K

A

D

H

I

M

N

I

S L F

V

A F

C

T

H

K

T

N

A

I

Y

H F

S

P

F V R

G

S

E D

A

V

M

L

N

I

P

T KG D A N A E E S L L VR H

H T N E H Y T RTQ P V C

H

E

Y

A V

N

H

K

T

M

N

D

C

M

H VA L

E

T

H

K

T

S

S

I

A

Q

S

S

H

K

N

R K

V

E

F

A

L

A

T

R

C

V

L

A

R

I

G

G

A

E

T

Y

Q

P

Q

T

A

S

M

V

N

P

LP EEYYGG LP GG A

T

L

N

A

S

E

E

T

F

V

N L

D H

K

A

A

F

I

K V

I

F Q E T

A

L

H

T

F FL

C

T

D

E

S

P

E

D

Q

K

P

G

GM LY F E I KV N NA A P M S I

K

M S

V

V

I

K

VH S

F

H

A

C

T V M

S D

K

SA

F

H

A

A

T

M

C P Q

I

T V

KKVI PKE QED

F

A

T

F L

TQ

VS

D

V

K

C

Q

E

G

S

S

Q

A

S

S

L GVI

D

S

S

N L FM

T

A

V

L

S M

A

R

T

D

H

I

N

V

D

I

S

M

N

T KG D A N A E E S L L R

A F

C

T

F V R

G

A

E

M

T

V

R

Q

H

I

P

E

A V

I

H VA L

H

E

L

H

D

A

T C

T

C

Y

A

E

T

MY

A

AY

K

T

F

S

M

I

A

D

H F

P

S Q

V

Y

P

A

S

VMK

G

V

S

S

E

H

P

C

V

F

AM

D

E

T

Q

A

T

AC

H

L

G

I

I

T

M Y Q R QY P F

V

A

I

Q

T

C

Y

S

T

K S

A

L

N

P

L

F

S

K

A

K

A

N

KT

R

F

QV

G

N

I

K V

E

N

Y

α10"

S

E

I

Q

HV L I Y V

N E

M KK SL MA

R

R

A

L

H

D A

H

S V V T R

A

V

M

F

Y

K

I

IR EV

T

A

T

T

F

F

S

T

A L

R

T

R M K

L V

D

L

N

M

T V M

N

C P Q

S Q T

W

N

Q

E

L

A M

Q

R T

A

F

T

T

A L DA W

I

Y

S

P V S

I

T

R

VY

S

TQ

G

S

I

H

C

K

Q

Q

A

T

AG R

H

H

T

F

A

N

K RI K RI R

S L

S M

F

S

A

Q

E

Q

Y

W

N

Y

A

F

A

V

K

T

R

A

E F

S A Y K

E T T Y V A G KT V F SY AM M GV L FQ S D K F K A A K L FQV E D SY L M I I F A I V M I I K N KV L N I L M A RV M H A AM A I N F MG T VA AM R H E AV F TQ P AA H R F M VD P DA FYF H I W V VS V Q F A M M MY S I R M K T S

F G V

G

M

P

G

M

F E

N

V

I

T

F

LGV

A A YD Q

N

I

I

K V

P

KL

T

E

W

NA

E

T

R T

W

N

Y

A L DA

S

GM LY

S D

N

C P Q

R M

M S

Q

N

R

F

F

P S

M

K

V

G

A

D S F A V I M L A RV A I M MG H E P A R VDA DA H V Q F M R M K T S

F

D

E

M

V

Y

V

H A N Y

F

E S

P

D

E

A

K

R

I

A Y

H

R

E

I

Q

QV

G

T

D A T

L

T

KT

K

D

W

N

W

S

N

R

Y

A L DA

I

R

C H

S E G

N

T

V

V

V

Q

K

C T V M

Q

E

V

N

D

S

H

Y

M

T

R

S

R

Q P D A

T

I

Y

T

SP LM I NK

Q SV

M

H

A MM

Q

T

H

A

L

W

S

G

D

Y

LF LF I IV VL F I R

K

M

I

K K

Q

R T

Y N

L

A RD

M S H I A A YT R

K

S K TQ T G S G

W

T

E

E

PQ S F

L

V

S

L

L

F

M

A

V

W

S

Y

S I Y P K

H

A MM

Y

G

T

G

R

T

Q

AA

V

M

T

N

K

T

Q

A

L

T

M

S

A

S

M

L

V

V

E

T TQ

H

I

T

T

Y

R

N

G

H

D

K

LT A

N

Y

V L

GA R

E

A

E

N

S

SS

I

C

V

P

V

Q

M H

G

K K G

L

R

F

KL G I W G VY KC K VT VM

F

S N

Y

M S

SA

Y

V

V

A

Y

V

T

L

K

M

Q S

Y

N

1

P

I

Y

T

E

T

M H

L

A P

L

V

V

A

F

R L

T

M

I

Y

E

D

M

N

K

Y

Y

R

T

V

F

N

Q KL VFRA I I S VL DA YN G I A L G WA F D T AGY SW A G AK DM TC V VA PQ K QR Y M T L L V E WM F AH I F S KY S QV MM F D K AV VM V L V A

A

N

A

M

A

G

K K

P H Q

I SV

I

G

I F L V E

Q

N

L

R

V

T

M

N

DSW A

KV

V

Y

T

V

F

S

S

S

L L E F

V

C

S

R

L

A

Y

Y

S N

A

H

Y

R

N

M

T

N

K

A MM

Q

E Q R

T

V

R K E

E

R

T

L

W

L

T

L

V

T

K

Y

H

K

I

A

V

S

M

G G YNA L

K

M

L

Y

G

C

N

K

RA

E

N

R

L

V

S

Y

VL G

A

AT

V

N

E

Y

I

V S

C

ED

KL G I W G VY KC K VT VM

M H

D P

E

L

L

P

QA

V

C

H

L

L V PF VI I LV PFLV I I L PFL T

M S A

F

Y

F

V

Y

F

I

L

S

A

R

A

R

T

A

V

P

V

L

E Q R

T

R

A

Y

Q

I

H

V

N

H

K

K

K

L

R

L

bits

I

R K V

M

H R R T

N

S

F

H

A

F

Y

bits

I

DY LQ

R

QG

R

T

L

A

ECFK KD

I VT I KT P E F P E S Q TSS

Q

S

I

A

V

R

T

W

C

L

ER F V Q

Q

V

E

Y

D A

T

G

TS

Y

N

LQT

S

P

L D

E

F AY

N

L

Q

C

S

R

V

RE R

V

S

F

Y

L R

W

LV L V AA

S

N

S

A

K

F

E

NA

Q

D

C

S

N

T

T

D

S

P N RH KN R LT

T T K P F LL 102* C D R P K R LGG L I VM KL Y N LD E I V K L A L Y F R P L D K R G R K V E G D KR SR Y L ENFF I PK T T IH K C Y S I I E Y I M A

G

E

T Y

N

E

DKEN T LE QAE D S GYS A

K

P

A

KQN GPHLP S R K

N

R

A

C

Q

N L C

D

E

FKNLF A

A

K

I

R M

Q

E

P

G AN N

Q

Q

IL

S TR

D

A

H

Y

N

T

T

QEV

QA

AT

V

D A

T

G

LI D YYT

L R L KK GR GDV R K FFSF K HRL ENL Y I P I HASY YN D C M L S P M Y I M C

TL

A A K I

L D

E

F

R

LT

I

M VV

N 0 4 2 2 4 3 T L L R P I R F LVF LL 1 D V CR LDRR E I QEG D N 1 KK DKFKNL A L G VL K L DE R I R K V Y VGK PEL TE G ELT NYKHA GL DK VMSK S DG L NL RL F KR D DL NLF K Y FDE ECF H E L I SF FF YN E VV Y D T I I LF S S I S EN Y I P I M TRL D I A A L D L I LLC Y 3 YV L S Y I I N P E E P 2 F M Y L G M M AN V C 0 0 L M L AKL L V VTL 2 L YL Y L N 1 I LM K I FC G I A QF GM C E TQV I V E N S TQF DLSFS I 4 L H QV VF R I I ED I I 4 T I VR LM I SLβ1" V L A MR Y KRW N I NM α4" L α5"TVL α6" L V SLGM I Q I QRM I F I YL P N T F L L C F P IL D LL 1 R D VV C E D R N D L A L 0 TD Y K V I K L LV I N K V PENYK G ELY GGD L NLSKF F DL KFQELF 3 E 178* I Y VV LLD M I I 3 S S 210 I V A A L L Y P 172 213 0 4 2 2 208* 4 3 L M L VTL L YL TG L V L Y V VF KL 1 I LM K I P I F I DA 1 DF S FNLL A QF GN CD E Q I V EA L MFL TR CR DLSF I V L H Q TD I E I VVGL SQ RN I E VLVK KF I D I EL L LVN I K VR I T I VEK I MNL KL V Y AL G LL TD NR GGL MR Y YK F D NMLDP D KFQELF YL LS E I V I M YS LYL M I WSV T L M I V QL I I I QR S P SF 3 N I V A A L L 2 L C Y P 0 0 F V 2 N GS E F Y 1 I FN I V Q MK Y NLDS H T A K K I REM F SP S I FN E GN 4 DI T P D E K H DP KD V 4 I RSL I N P QK VL I TT L VKK D I H K FL SL EM S LE I Y TL SRS G F R V KV V L L L LF V V L L YL A V F Y AKL M L T 1 I K I F L F G I S F A L G C E TQ I 0 I V Eweblogo.berkeley.edu M TQ I NLT VVV L DLSFSQV QL I L H LRM I ED I I VR I M 3 MRA Y KRW N I NM L S I 3 V SLGM LYL T Q I QRM I F P N LI C 0 2 2 N C 4 V α6"I FF α7" F VF N β2"L L LY α8" L VT GS β3" E Y L 1 S IL Y L Y V Q MK NL AK F ME KHVTL I 1 DF KGM I K SP NHE I A L SL E P F I R F I L G I D GN D A TNHS P CL F L Q V KM F I E E T K L H D K V D Q I V L I PRM Q Q N Q V TNF I S TT V S K L E DVR I I I DR SL I RS V E I E V I I LT MK VV II N SD RA LG I MK Y L KR SR G RL YM TF V S265/66* VL 3 L K VN LYL M I SL WS I T M Q I QL 260/61* L PF N A F LIM C 0 weblogo.berkeley.edu 0 243 2 216* 229* 252 4 F V K F N GS E F Y 1 S I Y V N H F E T MSP A LD K I I S IR M F N K VL S I E GN Q DI T P D E H DP KD VN QK I E L VKK D I H K I NK EM FL SL R LV 3 LT T P L S RS I Y TG F R V KV LS L A F 0 weblogo.berkeley.edu 2 VL G

bits bits

Q

I

FLR CK S E AK FLR CK S E AK G RV R FLR CK S E AKD GFLRRV CKRS E AKD

L E

L R

W

P I R T F VE

R K V

L D T

R

R

I D D L SN M SSV EA LE M F N P R

S

D

E

V

I

V

M

N

T

L PE I K

L

A

F

S

Y

DA

ATV AW I F QV K VM V

bits bits

I

L

I

Y

C

A

E

G

K K

N

H

R

I

FT

H

S

V

L

E

G

H R

V

Y

NA

D

D

E I KDL I

KK V

A

T

A

VDQ

KD

DY LQ

S N

G

E

W

NA

Q

KR

ECFK

L

G

H R R T

N

F

Y

D

E

D

A

K

P

A

N N

Q

L L L LH

F

Q

E

N L C

D

Q

T T

S TR P

83*

62*

L R KQN GPHLP CRAL_TRIO_N domain KS I

R

S

A

T

R

Q

R I EE

T

Y

D

W WC

N

H

T

E

LK

H S K

R

A

F

D

AG

H

Y

N

R

R

D

P I R T F VE

L D T

M

N

T

DKEN T LE QAE

RE R

V

C

F

A

D

D S GYS R AQ

M

M

S

H

A

K

Q

N

A R

Q

L

F

A RS L P

E I KDL I S

S

E

bits bits

I

Q

K N

T

T

S

E F M

E Q

K

V

P R

T

LR WL K LRRD WLG K LRRD WLG K

DA

E

K K

S

R K V

V

DY LQ

R

H R R T S N

F

α1"

PELFDLI A

M D Q

S KN

L

I

D

A

A

G

AL V

VDQ

NA

N

S

F

DI R I EE

A RS L P

KD

D DD LPDD LPDD

KR

ECFK

QG

R

A

C

F

T

T

M D Q

bits bits

RE R

V

S

H

Y

S KN

bits bits

Q

S

L

V

F

N 4 2 4 3 1 3 2 0 2 1 4 4

M

A R

Q

S

A

KQN GPHLP KS

DKEN T LE QAE D S GYS A

M

EP

bits bits bits

T

S

N

K N

V

P R

T

I

K

T

S

E F M

E Q

E

S KN

E I KDL I Q

DA

TE D

AL V

R I EE

A RS L P

0 3

LR WL K

DI

2

AC C

bits

3

M AN U

C

1 4

β4"

β3"

S

P

C

T

D

H

F V R

I

K

T

E K

GS

P K

G

P T

D A N A E E S L L R

D

K

D

I

Q

S

S

N

G

E

I

α11"

weblogo.berkeley.edu

C

908 930 4 004 629 XP x00 279 Bm s Mse S201 97 G M 72 94 1 O DP P 3 40 8 Dp Ag X FA0 378 5 E 2 Tc XP 6 2327 39 7 0 Am NP 310 9076 0 4 Dm XP 93 13 19 Ag 004S20 91 60 0 8 3 XP OG P 6 59 82 5 4 Bm DP m N P 62 Dp D g X XP A Am

Ms Manduca sexta Dp Danaus plexippus Bm Bombyx mori

Bm XP 004922800 Bm X P 0 0 4 9 2 2 7 6 2 Ms Msex005131 a Ms Msex0 Bm XP 0 05131b 0 4 9 2 2763 Dp DP Dp D OGS212706 Ms M POGS2127 05 Bm sex005 Bm XP 0049 127 X 2 2 P 758 Ms 00 Dp Msex 49227 Dp DPO 00512 57 4 Ms DPO GS2 Bm Ms GS 12755 Tc XP ex01 21275 4 A XP 00 195 D m X 96 492 9 Ag m N P 0 7589 7864 M X P 06 B s P 65 565 Dp m XMse 318 1882 388 Tc B P x0 02 XP F0 00 019 7 1 4 00 01 928 52 18 6B 97 10 2F 1 29 09 0

ACCEPTED MANUSCRIPT

SC M AN U TE D EP

AC C

DP Dp OG Bm DP E S2 X OG HJ 14 Dp Ms MP 00 S2 72 230 9 Dp DP se 492 142 49 D OG x0 9 29 Ms POG S2 038 378 Bm M S 14 04 XP sex 214 232 0 2 Dm 0049 038 33 0 Ag ABV 2924 5 8 8 X Ag P 32 2392 Dm XP 31 0369 8 NP Dm 6 052 NP 1051 Tc X 72481 0 Dp E P 97310 3 HJ 6 Dp EH 78510 J7851 Bm XP 1 0049 Ms com 29247 p5 Bm XP 0049 6142 29246 Ms scaffold0005 3 Bm XP 004929244 Tc XP 968368 Dm NP 730931 4 Ag XP 00343615 961 Ms Msex0058929 0492 Bm XP 0 HJ74795 Dp E 63328 5 6 P 006 Am X EHJ790832 Dp P 9732 Tc X 9703185 P 35 Tc X 01040 5959 0 NP Mse00 9084 m 7 B Ms EHJ 9085 0 Dp EHJ7 0596 1 3 Dp sex0 289 M 049 s 0 M P X Bm Dp

Bootstrap support >80%

64 02 09 1 31 18 65 93 7 2 6 XP P 9 310 S20 29 6 g A c X XP OG 059 883 4 T g DP ex0 492 63 A p s 0 29 D s M P 0 049 860 M m X P 0 000 B m X sex 311 B s M J75 460 M EH 969 77 Dp XP 3120 13 Tc XP 5723 61 Ag NP 0008 Dm Msex 00862 5 Ms Msex0 92963 Ms XP 004 2 Bm HJ7531 2928 Dp E P 00492 6 Bm X P 00492963 Bm X J70837 Dp EH 4929639 Bm XP 00 63 Ms Msex0008 Tc XP 966309 Tc XP 008197657 Tc XP 971470 Tc XP 971527 Tc XP 970129 Dp DPO Bm XP GS206990 Ms Ms 004928861 Bm X ex010502 Dp D P 00493 Ms M POGS2 3930 Bm sex0 06941 Bm XP 00 10499 Bm XP 0 49313 Ms XP 0 04931 68 367 Bm Ms 049 M XP ex00 3142 Ms s Ms 004 4519 2 M M ex 928 B s M sex 010 857 Ms m X sex 010 488 D M P 0 012 487 s 0 D p 1 Dp p E EHJ ex0 492 01 D D H 79 10 905 p P J7 2 49 EH O 9 91 2 4 J7 GS 292 Lepidopteran 92 20 89 69 expansion 88

7 2 49 05 10 29 x0 9 98 4 4 se 0 0 96 63 M 0 01 4 s XP sex 010 288 9 M x 9 Bm s M se 004 1048 5 M s M XP x0 049 M se 01 Bms M sex M sM M

Dp

Clathrin box

T Dp c X Bm D P P 0 X O 08 Bm Ms P 0 GS2 19 3 M 0 XP se 49 013 74 x0 30 4 1 Dp Ms 0 M 04 0 90 5 Tc DPO sex 93 629 6 Am XP GS 014 091 5 1 0 2 Dm XP 0 081 013 871 93 4 0 N Ms P 00 6566 2177 Bm Mse 126 455 0 Dp XP 00 x0107 581 DPO 49 0 GS 2562 2 Ms 2 5 Bm Msex0 15399 X 1 Dp D P 00492 4334 POG 5 S200 626 Dp D 02 PO Bm XP GS21539 2 8 00 Ms Mse 4925639 x01070 1 Tc XP 9723 45 Ag XP 317327 Bm XP 004931769 Dp DPOGS207391 Ms Msex002032 53 Bm XP 004922581 3 32 16 x0 Ms Mse S213497 Dp DPOG 87951 0016 Ag XP P 648579 Dm N 300003 304 PSCF p D D Msex018 557 Ms XP 315 939 Ag P 523 155 N 6 Dm GS20 9842 DPO 00492103262 p D XP ex0 031 Bm s Ms 0325 49212 M P0 97 8 X XP 209 42 Am Tc XP 3 6510 001 0 8 Ag NP F30 892 2 6 A C 2 PD PS 049 059 250 9 Dm p D P 0 ex0 931 703 08 D X s 4 0 3 PINTA M 0 4 Bm Ms P 0 GS2 00 X O ex s P m B D M Dp Ms

RI PT

Tc Tribolium castaneum Am Apis mellifera Ag Anopheles gambiae Dm Drosophila melanogaster

Contig ID 0

co

0 56 14 2_ c0 mp 65 60 co _c mp 0 67 17 co _ c0 mp 75 98 co _c mp 0 85 81 co _c mp 0 85 92 _c 0

_c

52 7

54

mp

co

_c

0

0

_c

69 3

50

mp

co

mp

co

72 9

45

_c

0

0

_c

57 8

35

mp

co

mp

co

45 1

34

0

_c

43 9

27

mp

EP TE D M AN U

SC

RI PT

400

co

0

_c

28 3

24

mp

co

0

_c

44 2

23

mp

co

0

_c

57 5

20

mp

co

0

_c

38 7

20

mp

co

0

_c

33 8

20

mp

0

_c

89 6

_c

0

300

co

mp

co

87 5

18

mp

co

17

0

_c

81 3

17

mp

0

_c

AC C

200

co

mp

0

_c

57 9

17

mp

co

51 2

17

100

co

0

0

_c

42 8

17

mp

co

9_ c

23

73

0

_c

11 6

17

mp

co

mp 1

co

0

_c

49 6

16

mp

co

0

_c

14 7

16

mp

co

_c

0

0

5_ c

49 0

13

mp

co

mp

co

19

6_ c

99

24

mp 1

03

mp 1

co

co

Expression level (FPKM)

ACCEPTED MANUSCRIPT

500 Female Male

*

0

ACCEPTED MANUSCRIPT

Highlights

• •

AC C

EP

TE D

M AN U

SC



The insect CRAL_TRIO domain containing proteins contain 18 highly conserved amino acid residues. We present the predicted 3D structure of the consensus insect CRAL_TRIO containing SEC14 domain. The CRAL_TRIO domain containing gene family shows a large expansion in lepidopterans with lineage specific duplication events. Thirty of the CRAL_TRIO domain containing proteins are expressed in the head tissue of Manduca sexta.

RI PT



1.00E-134 2.00E-133 1.00E-179 0.00E+00 4.00E-161 5.00E-158 2.00E-156 4.00E-117 5.00E-106 2.00E-165 5.00E-131 2.00E-92 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.00E-173 2.00E-83 1.00E-65 1.00E-97 5.00E-72 3.00E-100 2.00E-141 2.00E-86 1.00E-121 1.00E-83 4.00E-129 0.00E+00 0.00E+00 6.00E-132 3.00E-66 0.00E+00 7.00E-174 0.00E+00 5.00E-32 2.00E-76 8.00E-101

Identity Accession 85% XP_004929634 61% XM_004929578.1 65% XP_004929635.1 65% XP_004929639.1 78% XM_004928914.1 88% EHJ76881.1 71% XP_004929378 69% XP_004929248 75% AK382842.1 54% XP_004931422.1 54% XP_004922757.1 71% XP_004922758.1 60% XP_004922800.1 53% XM_004922706.1 93% XP_004928836 85% NP_001040355 78% XP_004928931 92% XP_004928929 81% XM_004928871.1 90% XP_004930908 91% XP_004930906.1 71% XM_004929785.1 42% XP_004928863.1 38% XM_004928806.1 46% XM_004928806.1 37% XM_004928806.1 45% XP_004928863.1 58% XP_004928863.1 43% XM_004928806.1 53% XP_004929052.1 44% XM_004928806.1 56% EHJ79293.1 80% XM_004925582.1 84% XP_004925625.1 61% XP_004927864.1 43% XP_004928863.1 82% XP_004925626.1 75% XM_004930854.1 92% XM_004922524.1 88% XP_004933182.1 52% XM_004929189.1 45% EHJ78511

ACCEPTED MANUSCRIPT

Top hit E-value of hit to Pfam CRAL_TRIO domain Amino acid sequence PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 4.40E-23 MPDQEIMQCVDLVTIKDWLSKQPHLPHDVDDILLLRFLTSCKFSLERTKRTIDFFFTIRHNAPELFCKRDPWAPEIRRVFEITDMLPLPNKTKENYKVFIYRLKSPELD PREDICTED: Bombyx mori uncharacterized LOC101746774 (LOC101746774), mRNA 2.00E-19 MSNITSQDIQYIRDWMAKEDYLPKDFSDMMIMKFLHSCYGSLEQTKQCIEKFCVGRLNMPEIFTNRDPLSPKLQTAFSITNMASYNAGECEILLYQIDDPTLENFDFY PREDICTED: uncharacterized protein LOC101746774 [Bombyx mori] 6.30E-20 MAVEMTNISNNDIQFIRDWMAKEDHLPKDFTDMMIKKFLHSCYGSLEQTKQCIEKFCAGRQNMPELYTNRDPLSPRVLTANSITTVATYNIEGDEILIHQLYDPNLE PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 4.80E-26 MDDKISDEVRKQLEDMKQWASTQPLIPKDIEDRILLRFIHSCYYDLEKAKVAADLFFSIRSASSELFTNRDPLAPAMQKTLQIVNLAQYVTSKNRNIWIWQLNDPGL PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101736577), mRNA 2.80E-24 MPVELEFDYDEATASMDKFSQDDIDELRAWTEKLDKSKCVPKDLTDKQLLLFYNACYGDMDKTKTCMEKYYACRKNAPEFFDNRVFNTQEMKSAQDVLEFSFL cellular retinaldehyde-binding protein [Danaus plexippus] 1.30E-34 MTEYAEYDWKLHQMQNERLCEAARKRLTDGDRQRALVALKDTVDSSLNLALRDKSRCQDDNFLRRFLYARKHDVQQSFELLVRYHQHRREHPELWATADGGV PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 1.40E-28 MSIRPLNPQLAEKAKKELFEDPSRIAADVQHLKEWIAKQPHLRARTDDQWLVTFLRGCKYSLERTKEKFDLYYSVRSTAPELFRIHINHAMFDEILSLGSMLILPKLA PREDICTED: clavesin-1-like isoform X1 [Bombyx mori] >ref|XP_004929249.1| PREDICTED: clavesin-1-like isoform X2 [Bombyx mori] 4.30E-28 MIRELPSDLAKIAKEELNEKPKRTKDDVKHLKDWIAKQPHLNARTDDQWLVAMLRGCKFSLERVKQKLDLYYTLRTTAPEVTLRILPTEDKFVEFFKLGSCLVLPK Bombyx mori mRNA, clone: fepM03J23 3.90E-20 MAFLQGPSLKQAEVIKQELGEGPGDLERGVRDLRSMLAASPHLPQPEAFDRHLLELFVRGCRMDLDRARSKLEAFCLARARHSDLYEHRSFTEPPLNDICKFMDIA PREDICTED: uncharacterized protein LOC101744959 [Bombyx mori] 1.40E-24 MPQDTQDLVQLNPDSVKYIRKLYGFEDEKKLEEAIDILDAWIKKQNHFTKKDFPRKYLESTILANKGSIERSKSKLDKLCTLRAILGFFFKFSNIRTDFSNFFKATTLIP PREDICTED: clavesin-2-like [Bombyx mori] 5.60E-13 MALKQISIQREYEKNPEISPEDIAQLREWLVTQPHLPGELLTDLDLLLVYHCCERSMQVSKQVLDLHYTLRTMFTQFFHNRSVDKKIEFTLDTVLICPLPTRSTDGNK PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 1.60E-21 MTADVVQPLPLEEEYKKKTRITPEDIKKFRQWLETQPHLPSDYITDMDLILTYHCCNRSSEVSKQVLDLNFTLRTLFTNLFKDRHPDDKMMKAFSYYLLLPLETRAH PREDICTED: clavesin-1-like [Bombyx mori] 1.30E-19 MTIINSYQQKINMSRLESFPLEEEYKKPTGITPQDIARLRQWLATQPHLPHEHITDLDLILLYHRYGCSIEVSKQELDNQYTLRTLFASFFKDRVLDEKIETVLNTVLIT PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101746971), mRNA 2.10E-19 MNSYVARSFPLEEEYKKPTGITPQDIARLRKWLATQPHLPQHITDLDLILSFHSCKRCMETTKKLLDTHYTMKTNFDAIFKNRIVDDKIELVLKRVLLNPLPTRTKDG PREDICTED: uncharacterized protein LOC101741307 [Bombyx mori] 3.50E-22 MGCWMKSQPHLPHIPDEYIFLFLHSNYYKLKETKDTIECYFTLRANTPDLFTNRDPLTPKNKTIMDITHMVALPNKTTEGYHVLLYRLADFDYSKLNFADAVRVFC cellular retinaldehyde-binding protein [Bombyx mori] >gb|ABF51246.1| cellular retinaldehyde-binding protein [Bombyx mori] 1.20E-28 MCNMKIKQELLVQPTGELWKAIRVELNEDVANNDRDLAAIREWLKKQPHLPDDWETGPLMTFLRGSSFSLEKCKRKLDMHFTMRAACPEFFTKRDASSPELSEIL PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 3.20E-29 MGINETSKFCGRQARKTNEPKMAVSLVQPSGEMWKKIRVELNEDINTRERDLATIKEWLRKQPHLPNEWDDNCLMTFLRGSSFSLEKCKRKLDMYFTMRAACPE PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 4.10E-28 MTLEQPAGEMWQKIREELNENVDTRDADLAHIKEWLKKEPHLPDEFDDQRIMTFLRGCKFSLEKTKRKLDMYFTMRTAVPEFFTNRDVARPELQEILNIVQMPPLP PREDICTED: Bombyx mori uncharacterized LOC101742743 (LOC101742743), mRNA 3.10E-09 MFGIVFNKQHNITHLSPKSFRMSPAASFAERWARDNLTRDPTPDELEVLGLFPITCDGDEIEASKRARRYYKARGPGGLTELWHKRHPDDEDILSSCQDVYIVPLRA PREDICTED: alpha-tocopherol transfer protein-like isoform X1 [Bombyx mori] 7.00E-44 MPAEAETRIVLRDVTPEISQHMLTDLEKLPGVQLGDFLLQFELDEPRESVREIARTELRETPEVVKPAIEELRRLLQEDKDLRVPLDSEAWMIRFLRPCKFYPESSYDL PREDICTED: clavesin-1-like [Bombyx mori] 5.40E-41 MASSEFRLERSVELLPETKEIAEKELRETPERVREALERLRELLKENKDLYFADDDELLTIFLRPCKWYPESALALMRRVAEFKRENASLLDGLLPEHEKDAFLEHK PREDICTED: Bombyx mori clavesin-1-like (LOC101736368), mRNA 4.30E-27 MTTASILMTVTTTWNVTNEKHEKDGGRLGHTLRIVVADAVKELRETSTIKDQTLHIMRDWIQQNNDIVNVRQDEAFLLRFLRHKKYSVPMAQQTLLKYLSLRKYY PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 2.80E-19 MDFFDDVFLKLHPNTVTTVRKLYNLEKREDMDRAIDALEDWLKKQYHFTKTDYTRDFLERSIIMTKGSVERAKTQISNLCSMRAVYPKFFEKCDARNDFNNLDSIA PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101745141), mRNA 3.30E-21 MDSINDILLKLHPDTVTTVRKLYNLEKREDLDRAIDALEDWLKMQHHFTKRDFPRDFLERSIIMMKGSVERAKYQISNLCSMRAMYPKFFDKYDAKNDFNKLDYF PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101745141), mRNA 1.10E-17 MESLPKDPILEFNDNTFKYMRKEYKFEKNEDMNAAIDAFEDWIAKQNHFTKKKYPRDYLERLIILQKGSVERAKTHMEKLCTLRTLLPQFFEHYDVKKDCIPFINGA PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101745141), mRNA 6.00E-19 MDAVPEDKLLRYTPDTMQALRRLYNLDKPGDMDFAITALQDWLKKQHHLTKKDYSRSYLERLIILQKGSLERSKAQIENICAMRTLLPEYFEINGDEEIVKLDGLS PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 9.30E-17 MESLPSDPILEYNDDTFESLRKQYNFEKKEDMDAAIDVFEEWLKKQNHFTRKEYPREHLERLIILHKGSVERAKTHMDKLCTLKTLLPHFFEHYDVRNDCASYIND PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 2.40E-23 MEELPRDKILTYNPDTLKAVRYEFNLENPGRLEEAINILEDWIKKQDHFVKKDFDREYLERTIVTTKGLVERAKARMDKLCTFKTLLPNFFQILEEKRHQSIAEKLSH PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101745141), mRNA 5.00E-18 MQAIPKLPILEFTDDALESIRKCHNLHDPGRMEEAVDILCEWIQKQEHFLRKDFPRDYLERIIISTKGSLEKAKVRLDKICTMKTLAPKYFQDQDLRDVKHVLDGFKF PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 7.60E-19 MESLPCNPILKFKPDTLDVIRKSFNLDQRELNEAIDRFEEWLNKQEHFVKKDFSRDYIERTIIQGKGSVEKAKSQMDKLCTLKTLLPKFFKNCDIKTEFKDISQIAWLI PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101745141), mRNA 9.20E-20 MEELKPTNILKFNPNQMADVRKVFGYGDIGILKQDIKILEDWINKQNHFIVRDFGRDYLERFLIYNKGSVERAKKHFDRLCTFRNLMPEFLRDFDVRNEFVDLFKCS putative CRAL/TRIO domain-containing protein [Danaus plexippus] 5.60E-18 MERPPHHPLVIVTEDDIKAIREIYNIGDENKINESLNAIQEWIQKEEHLVEASKYLSRRNLERVLLIAKGSVEETKRRIERLLTARGMIPELCLNKTLEEFEDIWDAVNY PREDICTED: Bombyx mori clavesin-1-like (LOC101744403), mRNA 6.30E-27 MKTDIKMWSFEEIAFQAEMNRHEDPDSEEDAFVLCGEDPTTRNKKIVEFRNLIKERNECHPHRTDDAYLLRFLRARQSIPARAHRLLVRYCNFREANPHLWRDVDW PREDICTED: clavesin-1-like [Bombyx mori] 6.70E-31 MPFIDIAFQAEISRYEDPEFEEFAKKNCNENFETRGKAIDELRRMIREKGECCPRRLDDAYCLRFLRCRRFIPALAHKLMVRYEEFRHKNAYLYNCDAFGLQKVKDV PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 1.70E-17 MTVSQIAMEIEYNKETNIKPEVILRLREWLQKQAHMPHDHITELDIILAYHCCDCDAEITKRVIDLNFTARTLFSFYQNREINYSLETALHTWLVTPLDAATNKGYRP PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 2.00E-15 MGSCDDELLKLHSDTVSTLRKLYKLDKREDMDRAIDALEDWLKKQYHFTKREYPREYLERSIIFMKGSVERVKLQISNLCSMRTIFPNFFEKVDAINDFVHYEDPV PREDICTED: clavesin-2-like [Bombyx mori] 2.50E-26 MYECFLEIAYEAELNTEENEEMLELAKELCNENPATRATAITDLRQMIYERGECRPWRTDDAFLLRYLRARNFVVPRAHKLLVRNCVFREQHNQLYEGVDLWGLV PREDICTED: Bombyx mori alpha-tocopherol transfer protein-like (LOC101742992), mRNA 9.70E-36 MPTPDYSVDIDLGEPPPELEDYARQHCGEDPSTKLQAIDELRDMIYERGECTPHRMDDEFLVKFLRARNFIPQRAHRLLVNYYQFKEENPELFDNVYPLDLQGIGDA PREDICTED: Bombyx mori retinaldehyde-binding protein 1-like (LOC101739669), transcript variant X1, mRNA 3.30E-30 MATTVLQPEECRREHEPKPNIIDSPRLTDDERVQILEAKERETGDLPPELREKARMELREEPALREQALAQMRHFIDKHPAIKKCRTDAPFLLRFLRTKKYSIPQACSM PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 7.00E-11 MRHKEIYGVNVHPTIKFALDFGMALISEKIRKRVKLYTSIEDVEIDKSVLPKEYGGVMPMRQMI PREDICTED: alpha-tocopherol transfer protein-like [Bombyx mori] 2.10E-12 DQWFLAFLRGCKHDVEKSKRKLEKYYTLKKSAPDFFANRNPTDEKIQALLKLGLPLRQCVKEDSPRTCLVRVSQLCNSQYSLVDLTKIGFMTAEILLLEDDNFTVV hypothetical protein KGM_17942 [Danaus plexippus] 3.10E-25 MTVRALSSVLRKKAEEEVNETPNVADAAIAQLRDWLHQKPHLRAAKIPDQWLIAFLRGSKFNINKAQEKLDTYYTLKTIAPEIAANRHPFDASIQEILKIGFFLPLRK

RI PT

E-value 0.00E+00 2.00E-125

EP

TE D

M AN U

SC

Manduca sexta Trinity mRNA contig Expressed in head? yes no yes yes yes yes no no yes yes yes yes no no yes yes yes yes yes yes yes yes no no no no no yes yes yes yes yes yes no yes yes yes yes yes no no yes

AC C

Manduca sexta gene

Msex000860 comp16496_c0 Msex000861 N/A Msex000862 comp34451_c0 Msex000863 comp17428_c0 Msex001952 comp17116_c0 Msex002032 comp23442_c0 Msex003804 N/A Msex003805 N/A Msex004308 comp18896_c0 Msex004519 comp8581_c0 Msex005124 comp17875_c0 Msex005127 comp17579_c0 Msex005131a N/A Msex005131b N/A Msex005929 comp8592_c0 Msex005959 comp6560_c0 Msex005960 comp50693_c0 Msex005961 comp45729_c0 Msex005962 comp54527_c0 Msex006294 comp24283_c0 Msex006295 comp6717_c0 Msex010326 comp20338_c0 Msex010487 N/A Msex010488 N/A Msex010489 N/A Msex010492 N/A Msex010495 N/A Msex010496 comp17512_c0 Msex010497 comp17813_c0 Msex010498 comp13490_c0 Msex010499 comp16147_c0 Msex010502 comp20387_c0 Msex010701 comp173239_c0 Msex010702 N/A Msex011959 comp7598_c0 Msex012101 comp124195_c0 Msex014334 comp27439_c0 Msex014871 comp20575_c0 Msex016323 comp35578_c0 Msex018304 N/A MsexScaffold00053 (961518-965039) N/A N/A comp56142_c0

ACCEPTED MANUSCRIPT

RI PT

NDKRWAVDDSLRPLNSKDDRVDKVKELPGSFRSLALD NSVEESREFYLDDNLWKAIPTKKSKAMENAMNGSFRTLSID WKGLAAKKSKSADNAMNGSFRTLSID VFMKKSLDLFSKNREHFLNDNLWRVDKKSKNKDVSTEIGSFRSLAID KFYETVIPRSSLPPDHGGTLPDTQTLHKKCMQQLQLLEPYFKAEEEQRINALPDKKREKAMEKAFKNLDID ERIVLHGNNLSTLHDALPLDILPAELGGEGPSYNSERWLQEFCRSENINPAPVTPVTTAASDDNDLPSAKLDKAYKQKDNANFSFNGSEKSAKSELLRDKD IIDYWKKKVQEYRVFLEEDQTFGTDESKRPGKPKTAEDMFGVEGSFRQLQFD HIPKEIVPTEYGGDGGPLVDIIEYWVSKIKEYQDWMRQEVKFGTDESKRAGKPMVKEMAGLDGSFRKLEID LLEHISASVLPQEWGGEEEPIDVLVRKWRQRIDESREFLRDLNELCCAVPAQIADAEVYGTMGSFRKLDID SYKEIKDSWIELLSSKESMEYLKASREATVNEAYRLKSDYNEHLGLAGTFRTLSID ARPNNVDNIFGIQGSFKKLDID MEALPKEAGGQYKDFHTAQADILAKIKGSPAFFAEENKKRVKEAVRPGKPKTITDIFGGVEGSFKKLEID YKTLEELREETQAMMRANKDFFVEENKKRVKESLRPGKPKTITDIFGSVEGTIFLKQG SFHSCKRCMESTKKLLDTHYT LIETEMFKADEKKRIKKPSKGMFASFTSSFKSLDID DCSTMEALHEQWTMKLQEYRDWFKRQESIKANEELRPGKPTNYDELFGIDGSFRQLTID PFLKEKMKNRIFFHTDITELQKYIPKEIIPIEYGGNGCSMEEVNIAWMKKLEEYIPWFKEQENIKADEALRPGKPTNYDELFGIDGSFRQLSID YVPKDMLPAEYGGNAGPMNDLHNAWVKKLGEYTDWFTENENIKANEALRPGKPTNYDELFGIDGSFRQLVID GAMKDLNDEWTKKLLSNAKWLQTEERKFYDTDLRPEVRTRKNKNAMRTLRSNIGSYDMITRAHSCRSLHTRDDLDEGLYGAYRTLKVKPDKFYV EKLRSRIIFMGNDRDLLHKHISPKCLPESYGGTLTIPSVTGAQWLELLLMCDREFMAINSYGYKKK HGSKMSSLHKHMSPSHLPADYGGELPAIDYSGAEWYPVINDILPHFDKMELQLNEDGKPYVMW FAISKVTPKMAERLHIYTNLKQLHKQTDAECLPTIYGGPIELQNMIVYTKQLLGEQRKKVLALDEMLITNTRGIISSRAKNVKSDTTSVEGSFRKLEID IAVLKEQWKDLLGTKENVNYMQEMSNAVSNEAYKIKDKYNLDGAG MAVLQGMNNGKMFLVRKKTLITCKR ESLQVTHDKWIDVLSSKEFVEFIQVMNQGRTNESLRLNDTVKEQYMGIAGTFRTLSVD ADRHMKWVQALTSKKMTEHLKEMSEAKSTQGYKMKDTISDPDLGISGSFRSLNVD QESLESIYEKWLDVLTTKEFMEYLQVMNQARSREDLRLNDTFKEEYMSIAGTFRTLSVD TLYDYVPKSMLPSEFDGDGKSLRELHQQWIEVITSEEFTQYLKEMNTATTNEAYRQTDKFNEQYLGMPGSFRTLSVD KSIAELHDKWFEVLNSKEHREYIKETVKAGTNESRRLINKLNDEYLGMPGTFRVLKID VEELNKKWMEIISTEEHTAYMKEMNGACTDESRRSKLDFNNEYIGMPGTFRTLTVD KELSEANFKELSTDEHIARLKFLEKAATNEDLRLSCKFNEEYSGLPGAFKTLCVD MSCDKLNELWKETLKSEESQKIIRDTEKLVSDETKRHESKFNEEYMGMPGSFRRLTVD KVYMHGYDITSLHRYIDPECLPKRYGGYRDPVSLERWLTKIKEYKNKSFDNDMRNLGYGVD RHVVTTEEWLAKVYKYRDDFLVQELRDLGFYVK KANKDFFEMESKKRVDETKRTERPKTISSFFPNMEEAFKKLEID

HGSDLKSLQQHISLDCLPPRYGGTCRSHVSFAHWLKKVKKYRDEKFDREMKDLGYVIKE VWPDYPYTIWFESLRKNYHVAKEMVACGYKFREEEISADVVRKLKEDGIQLS CTHFVNIPQYGVRFFEFAVSLLSDKLKDRVMFHRACDDLTKHVDPAILPKEYGGTVPLKDMIEELKRNLLKHREELLALDDLSIDLYALEKNDLTQDIHSTAGSFRKLELD

AC C

EP

TE D

M AN U

SC

QDLIDHWKMKVETYSDWYAQHDSIRPDESKSSDAKNSANVLGIVNCFKKLDID

Molecular evolution and expression of the CRAL_TRIO protein family in insects.

CRAL_TRIO domain proteins are known to bind small lipophilic molecules such as retinal, inositol and Vitamin E and include such gene family members as...
4MB Sizes 0 Downloads 6 Views