This article was downloaded by: [National Pingtung University of Science and Technology] On: 03 January 2015, At: 21:42 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Bioscience, Biotechnology, and Biochemistry Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tbbb20

De novo assembly, gene annotation, and simple sequence repeat marker development using Illumina paired-end transcriptome sequences in the pearl oyster Pinctada maxima a

a

a

b

b

a

Yuewen Deng , Qiannan Lei , Qunli Tian , Shaohe Xie , Xiaodong Du , Junhui Li , Liqun a

a

Wang & Yuanxin Xiong a

Fishery College, Guangdong Ocean University, Zhanjiang, China

b

Pearl Research Institute, Guangdong Ocean University, Zhanjiang, China Published online: 22 Jul 2014.

Click for updates To cite this article: Yuewen Deng, Qiannan Lei, Qunli Tian, Shaohe Xie, Xiaodong Du, Junhui Li, Liqun Wang & Yuanxin Xiong (2014) De novo assembly, gene annotation, and simple sequence repeat marker development using Illumina pairedend transcriptome sequences in the pearl oyster Pinctada maxima, Bioscience, Biotechnology, and Biochemistry, 78:10, 1685-1692, DOI: 10.1080/09168451.2014.936351 To link to this article: http://dx.doi.org/10.1080/09168451.2014.936351

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Bioscience, Biotechnology, and Biochemistry, 2014 Vol. 78, No. 10, 1685–1692

De novo assembly, gene annotation, and simple sequence repeat marker development using Illumina paired-end transcriptome sequences in the pearl oyster Pinctada maxima Yuewen Deng1, Qiannan Lei1, Qunli Tian1, Shaohe Xie2, Xiaodong Du2,*, Junhui Li1, Liqun Wang1 and Yuanxin Xiong1 Fishery College, Guangdong Ocean University, Zhanjiang, China; 2Pearl Research Institute, Guangdong Ocean University, Zhanjiang, China

Downloaded by [National Pingtung University of Science and Technology] at 21:42 03 January 2015

1

Received March 14, 2014; accepted May 8, 2014 http://dx.doi.org/10.1080/09168451.2014.936351

We analyzed the mantle transcriptome of pearl oyster Pinctada maxima and developed EST–SSR markers using Illumina HiSeq 2000 paired-end sequencing technology. A total of 49,500,748 raw reads were generated. De novo assembly generated 108,704 unigenes with an average length of 407 bp. Sequence similarity search with known proteins or nucleotides revealed that 30,200 (27.78%) and 25,824 (23.76%) consensus sequences were homologous with the sequences in the non-redundant protein and Swiss–Prot databases, respectively, and that 19,701 (18.12%) of these unigenes were possibly involved in approximately 234 known signaling pathways in the Kyoto Encyclopedia of Genes and Genomes database. Ninety one biomineralizationrelated unigenes were detected. In a cultured stock, 1764 simple sequence repeats were identified and 56 primer pairs were randomly selected and tested. The rate of successful amplification was 68.3%. The developed molecular markers are helpful for further studies on genetic linkage analysis, gene localization, and quantitative trait loci mapping. Key words:

Pinctada maxima; Illumina paired-end sequencing; de novo assembly; EST–SSR

The pearl oyster Pinctada maxima is one of the most important components of molluscan mariculture in the southern provinces of China. This species is cultured mainly for the highly valued production of round pearls. Its successful production in hatcheries was first achieved in 1970, and this species has been cultured for approximately 40 years in China. Over the last few decades, many researchers have focused on developing various protocols for artificial seed production and larvae, juvenile, and adult cultivation.1) Breeding programs have also been initiated by introducing breeder stocks from Indonesia.2,3) Despite its economic importance, comprehensive knowledge of the molecular mechanisms involved in *Corresponding author. Email: [email protected] © 2014 Japan Society for Bioscience, Biotechnology, and Agrochemistry

pearl formation and development remains limited. Transcriptome analysis is a powerful platform for examining complex molecular mechanisms. Numerous technologies have been developed to analyze and quantify transcriptomes. Initially, traditional Sanger sequencing methods were used. However, this approach is costly, time consuming, and sensitive to cloning biases because it involved cDNA library construction, cloning, and labor-intensive sequencing. Transcriptome analysis based on the next-generation sequencing (NGS) is more attractive for its characteristics of cost efficient, high throughput, and rapidness. NGS, including the Roche/ 454 Genome Sequencer FLX Instrument, the ABI SOLiD System, and the Illumina Genome Analyser, is a powerful tool and has been utilized in many researching areas, including resequencing, microRNA expression profiling, DNA methylation, and de novo transcriptome sequencing for non-model organisms.4,5) Recently, the transcriptome of pearl oyster species had been analyzed on basis of the NGS technology, which has help to understand the molecular mechanisms on pearl formation and development of the species.6–13) Molecular markers have broad uses in genetic research (e.g. construction of genetic maps and gene mapping), breeding (e.g. molecular marker-assisted selection), gene cloning, and comparative genomics. Simple sequence repeat (SSR) markers are generally preferred over random amplification of polymorphic DNAs (RAPD) and amplified fragment length polymorphisms (AFLP) because of their many advantages, which include genetic codominance, abundance, dispersal throughout the genome, multi-allelic variation, high reproducibility, and high level of polymorphism.14,15) SSRs can be divided into expressed sequence tag (EST)–SSRs and genomic SSRs on the basis of the original sequences used to identify simple repeats. EST–SSRs are derived from expressed sequences, which are more evolutionarily conserved than non-coding sequences; therefore, EST–SSR markers have a relatively higher transferability than genomic SSRs.16) Compared with genomic SSR

Downloaded by [National Pingtung University of Science and Technology] at 21:42 03 January 2015

1686

Y. Deng et al.

markers, these new EST–SSR markers may help to identify candidate functional genes and increase the efficiency of marker-assisted selection. In addition, the traditional methods used to isolate and identify genomic SSRs are costly, labor intensive, and time consuming.17) Advances in large-scale RNA-seq provide many opportunities to identify and develop EST–SSRs using data mining with bioinformatic tools. In this study, we utilized Illumina HiSeq 2000 pairedend sequencing technology to characterize the mantle tissue transcriptome of the pearl oyster P. maxima and develop EST-derived SSR markers. This study aims to obtain extensive novel information about the transcriptome of P. maxima, and thus, promote a better understanding of the molecular mechanisms involved in pearl formation and development. This study also aims to develop valuable SSR marker for marker-assisted selective breeding of this species.

Materials and methods Experimental animals. In April 2010, an oyster stock was established by the breeders introduced from Indonesia. In December 2012, 30 animals were sampled from the stock. The mantle tissue was dissected and immediately flash-frozen in liquid nitrogen. Mantle of an animal randomly sampled from the 30 animals was used for cDNA library construction. The mantle tissues of the 30 animals were used for SSR validation. RNA extraction and construction and sequencing of cDNA library. Total RNA was extracted from the mantle of our sample using TRIzol reagent (Invitrogen) according to the manufacturer’s protocol. The cDNA library was constructed as follows: mRNA molecules containing poly (A) were purified using Sera-mag Magnetic Oligo (dT) beads from the RNA sample. A fragmentation buffer was added to break the mRNA into short fragments (200–700 bp). Using these short fragments as templates, first-strand cDNA was synthesized. Second-strand cDNA was synthesized using buffers containing dNTPs, RNaseH, and DNA polymerase I. The synthesized cDNA was purified using a QiaQuick Polymerase chain reactions (PCR) extraction kit (Qiagen). The purified cDNA was connected to sequencing adapters. The cDNA fragments (about 200 bp) were excised from 2% TAE-agarose gel (Certified Low-Range Ultra Agarose, Biorad) using a QIAquick gel extraction kit (Qiagen). The DNA fragments were amplified by PCR. After 15 PCR cycles, the amplified library was sequenced on an Illumina HiSeq 2000 sequencing machine (Illumina, Inc. USA) in BGI-Shenzhen (Beijing Genomics Institute). The details of the experiment are as follows: (1) expected library size is 200 bp; (2) read length is 90 bp; and (3) sequencing strategy is paired-end sequencing.

three filter steps: (1) discard the adaptors; (2) remove reads with unknown bases more than 10%; and (3) filter low-quality reads that own more than 50% bases with low-quality value ≥10. All subsequent analyses were based on filtered raw reads. The raw reads are assembled to unigenes by Trinity,18) Tgicl,19) and Phrap.20) We calculated the RPKM values to represent the unigene expression levels. Unigenes were first aligned to the National Center for Biotechnology Information (NCBI) non-redundant protein (NR) database (http://www.ncbi.nlm.nih.gov), Swiss–Prot protein database (http://www.expasy.ch/sprot), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (http://www.genome.jp/kegg), and Cluster of Orthologous Groups (COG) database (http://www.ncbi.nlm.nih. gov/COG) using BLASTx with an E value of < 1e-5. If the results of the different databases conflicted with one other, a priority order of NR, Swiss–Prot, KEGG, and COG was followed when determining the sequence direction of the unigenes. For unigenes with sequence directions, we provided their sequences from the 5′ end to the 3′ end. For unigenes without any direction, we obtained their sequences from the assembly software. SSR marker identification and primer design. Unique sequences were used to search for potential SSR markers using the MISA tool (http://pgrc.ipk-gater sleben.de/misa/) with a minimum of 6 repeat motifs (di-, tri-, tetra-, penta-, and hexa-nucleotide). We designed primer pairs using Primer 3.0 (http://fok ker.wi.mit.edu/primer3/). The presence of a sequence with at least 50 bp on both sides of the microsatellite repeats was used for primer design with Primer 3.0 and considered as potentially amplifiable loci. SSR amplification and polymorphism testing. Thirty samples were used for polymorphism validation. PCRs were performed in 10 μL of reaction mixture containing 30–50 ng of genomic DNA, 0.5 μM of each primer, 1 μL of 10 × PCR buffer (10 mM Tris–HCl (pH 8.3), 50 mM KCl, and 1.5 mM MgCl2), 0.2 mM each dNTP, and 1 U of Taq polymerase. PCR runs began with an initial denaturation step at 94 °C for 3 min, followed by 30 cycles at 94 °C for 30 s, 30 s at the annealing temperature, and 72 °C for 45 s; a final extension step was performed at 72 °C for 5 min. PCR products were separated on 8% (w/v) polyacrylamide gels using a pBR322/BsuRI marker (MBI Fermentas) by silver staining. Data deposition. The raw reads of P. maxima in the study were submitted to NCBI Sequence Read Archive under the accession number of SRP041153.

Results Transcriptome analysis. The images generated by the sequencers were converted into nucleotide sequence data by a base-calling pipeline. Sequence data were saved in fastq format and filtered for downstream analysis using in-house C script, which mainly contains

Sequencing and assembly A total of 49,500,748 raw reads from the pearl oyster mantle tissue were generated. The GC content and Q20 were 43.22 and 95.84%, respectively (Table 1). Assembly of the 49,500,748 reads produced 193,168

Gene annotation, SSR, 1 Pinctada maxima Table 1.

Overview of the sequencing and assembly process.

Downloaded by [National Pingtung University of Science and Technology] at 21:42 03 January 2015

Total number of raw reads Total nucleotides (Nt) Q20 percentage GC percentage Total number of Trinity Total length of Trinity (Nt) Mean length of Trinity (Nt) N50 of Trinity Total number of unigenes Total length of unigenes (Nt) Mean length of unigenes (Nt) N50 of unigenes

49,500,748 4,455,067,320 95.84% 43.22% 193,168 57,498,701 298 396 108,704 44,272,045 407 486

Trinity, and Trinity N50 was 396 bp. The transcriptome library analysis generated 108, 704 unigenes, ranging from 150 to 10,209 bp, and the unigene N50 was 486 bp. These results show that the unigenes obtained are suitable for annotation.

1687

Functional annotation of the unigenes For the annotation of assembled non-redundant unigenes, a sequence similarity search was conducted against the NCBI NR and Swiss–Prot protein databases using the BLASTx algorithm21,22) with a cut-off E value of 1e-5. We found that 30,200 (27.78%) and 25,824 (23.76%) unigenes were homologous with the sequences in the NCBI NR and Swiss–Prot databases, respectively. To analyze whether or not the unigene size can affect annotation rate, we analyzed the relationship between unigene size and annotation rate. The annotation rate was 12.59% (106/842) when unigene size was approximately 150 bp and the annotation rate increased to 96.45% (1222/1267) when the unigene size was longer than 2 kb. These results suggest that the unigene annotation rate is related to unigene size. We analyzed the E value and similarity distributions of top hits in the NR and Swiss–Prot databases. Significant homology (E value < 1e-50) and high similarity (greater than 80%) were observed in 21.38 and 12.96% of the sequences, respectively (Fig. 1(A) and (C)). In the Swiss–Prot database, an E value of 21.69% and a

Fig. 1. Characteristics of the similarity search of unigenes against the Nr and Swiss–Prot databases. Notes: (A) E value distribution of BLAST hits for each unigene with a cut-off E value of 1e-5 in the Nr database. (B) E value distribution of BLAST hits for each unigene with a cut-off E value of 1e-5 in the Swiss–Prot database. (C) Similarity distribution of the top BLAST hits for each unigene in the Nr database. (D) Similarity distribution of the top BLAST hits for each unigene in the Swiss–Prot database.

Downloaded by [National Pingtung University of Science and Technology] at 21:42 03 January 2015

1688

Y. Deng et al.

similarity distribution of top hits of 12.70% were obtained (Fig. 1(B) and (D)). The Gene Ontology (GO) database includes information on biological processes, molecular functions, and cellular components. This database is an international standardized gene functional classification system that offers a dynamically updated controlled vocabulary and strictly defined concept that comprehensively describes the properties of genes and their products in any organism. As shown by the BLASTx results, 12,497 were assigned to one or more GO terms. In total, 78,072 GO assignments were obtained: 39,285 were assigned to biological process, 15,507 were assigned to molecular function, and 23,280 were assigned to cellular component (Fig. 2). Under the biological process category, cellular process (7517; 19.13%) was the largest group, followed by metabolic process (5809; 14.79%), biological regulation (3183; 8.10%), and regulation of biological process (2851; 7.26%). Under the cellular component category (Fig. 2), 7298 (31.35%) unigenes were assigned to cell or cell part, followed by organelle (5017; 21.54%) and membrane (2887; 12.40%). Under the molecular function category (Fig. 2), binding (7309; 47.13%) was the largest group, followed by catalytic activity (5312; 34.26%), transporter activity (837;

5.40%), molecular transducer activity (547; 3.53%), and (370; 2.50%). All unigenes were searched against the COG database for functional prediction and classification, and 19,671 unigenes were assigned to COG terms (Fig. 3). For the COG functional classification, only general function prediction was the most representative (18.26%), followed by translation, ribosomal structure, and biogenesis (9.39%). Extracellular structures (0.08%) and nuclear structures (0.03%) were the least represented COG terms (Fig. 3). To evaluate the completeness of our transcriptome library and effectiveness of our annotation procedure, we searched the annotated sequences for genes involved in a set of signaling pathways using KEGG pathway mapping.23) Alignment to the KEGG pathway revealed that 19,701 unigenes were involved in 234 signaling pathways, 9 of which were related to innate immunity, including toll-like receptor signaling pathway, endocytosis, chemokine signaling pathway, lysosomes, peroxisomes, phagosomes, primary immunodeficiency, allograft rejection, and graft-vs.-host disease. The transcriptome profiling data obtained in this study may contribute to the identification of immune-relevant genes and infection markers in shellfish species.

Fig. 2. GO classification of the assembled unigenes. Note: The results are summarized into three main categories, namely, biological process, cellular component, and molecular function.

Downloaded by [National Pingtung University of Science and Technology] at 21:42 03 January 2015

Gene annotation, SSR, 1 Pinctada maxima

1689

Fig. 3. COGs classification. Note: In total, 19,671 sequences were grouped into 25 COG classifications.

Table 2.

Summary of the EST–SSR search results.

Total number of examined sequences

108,704

Total size of examined sequences (bp) Total number of identified SSRs Number of SSR containing sequences Number of sequences containing more than 1 SSR Number of SSRs present in compound formation Di-nucleotide Tri-nucleotide Tetra-nucleotide Penta-nucleotide

42,079,763 1764 1663 95 75 79.42% 18.71% 1.64% 2.27%

Identification of biomineralization-relevant genes By searching annotated unigenes, 91 putative genes potentially involved in biomineralization, such as aspein, MSI60, prismalin-14, shematrin, Perlucin, and Perlwapin, were obtained (Table S1). There were 10 candidate biomineralization genes, each of which had at least 100 RPKM. The expression level of prismalin14 was the highest, reaching 1460.92 RPKM, followed by KRMP-7 (1396.54 RPKM), mantle protein (1269.42 RPKM), and mantle protein 12 (735.48 RPKM). The expression level of tyrosinase-like protein 1 was the lowest (0.17 RPKM) (Table S1).

EST-derived SSR detection and primer design In total, 108,704 sequences containing 1764 potential EST–SSRs were identified. Among these sequences, 95 sequences contained more than 1 EST–SSRs, and 75 EST–SSRs were present in compound form (Table 2). On average, one EST–SSR was found in every 23.85 kb of the unigenes and the frequency of EST–SSRs was 1.53%. The most abundant type of repeat motif was di-nucleotide (79.42%), tri-nucleotide (18.71%), hexa-nucleotide (1.64%), and penta-nucleotide (2.27%) repeat units (Table 3). The frequencies of EST–SSRs with different numbers of repeat unit were calculated. Among the detected SSRs, 29 motif sequence types were identified. The EST–SSR frequencies of the di-, tri-, tetra-, and penta-nucleotide repeats were 4, 10, 13, and 2, respectively. Among these types, AT/AT (32.31%) was dominant, followed by AG/CT (28.68%), AC/GT (18.31%), ATC/ ATG (4.99%), AAT/ATT (3.68%), AAC/GTT (2.66%), and ACC/GGT (2.60%). The frequency of the remaining 24 types of motifs accounted for 6.75% (Fig. 4). To check the successful amplification proportion of these SSRs, we randomly designed PCR primers with optimal expected product sizes for 56 microsatellites using Primer3.0. Among the 56 loci tested, 36 (63.8%)

1690

Downloaded by [National Pingtung University of Science and Technology] at 21:42 03 January 2015

Table 3.

Y. Deng et al. Distribution of EST–SSRs based on the number of repeat units.

Number

Di-nucleotide

Tri-nucleotide

Tetra-nucleotide

Penta-nucleotide

Total

6 7 8 9 10 11 >12 Total

470 270 161 181 196 116 7 1401

189 123 15 1 0 1 1 330

26 1 1 0 0 0 1 29

1 1 0 0 1 0 1 4

686 395 177 182 197 117 10 1764

this study, we analyzed the relationship between unigene size and annotation rate and found that 12.59% of the unigenes shorter than 150 bp had BLAST matches and that 96.45% of those longer than 2000 bp had hits. This finding has also been observed in other organisms.8,26,27) The fraction of sequences that had no hits likely included alternative splice variants, novel gene products, and differentially expressed genes, which are important for further research. The annotated unigenes are valuable resources for conducting comparative genomic studies.

Fig. 4. Frequency distribution of EST–SSRs based on motif sequence types. Note: Among the EST–SSRs searched, 31 motif sequence types were identified. The A/T di-nucleotide repeat motif was the most abundant motif detected in our EST–SSRs.

were successfully amplified. Additionally, 24 (42.9%) of the tested microsatellites were polymorphic across panels of 30 individuals sampled from the cultured stock. The characteristics and success of these polymorphic SSRs are listed in Table S2.

Discussion Transcriptome analysis In this study, a total of 49,500,748 reads were generated and assembled into 108,704 unigenes with an average length of 407 bp. These unigene sequences are similar to reports for P. martensii (mean length of unigene: 3938) and 434 bp12)) and P. maxima (447 bp16)). The sequences obtained were larger than that of H. midae (260 bp24)). Given the lack of a reference genome, estimating the number of genes and predicting the potential functions of the transcripts were difficult. Thus, we conducted a BLAST analysis using public protein databases to indirectly identify genes in our collection. In this study, approximately three-fourths of the assembled sequences could not be assigned to a specific functional annotation because they matched a protein of unknown function or no homologous nucleotide sequence was found in the database. Considering that the ability to detect significant sequence similarity partly depends on the length of the query sequence, many of the short sequencing reads obtained using next-generation technology could not be matched to known genes.25) In

Identification of biomineralization-relevant genes Pearl (or shell) formation is a typical biomineralization process, wherein living organisms produce minerals to harden or stiffen existing tissue under biological control.28) This is a complex process that involves a series of matrix proteins and genes. In this study, we obtained 91 biomineralization-relevant unigenes with significant similarities to genes that may be involved in pearl or shell formation, which included aspein, MSI60, prismalin-14, shematrin, Perlucin, Perlwapin, and so on (Table S1). Aspein, an acidic protein rich in aspartic acid, is involved in the formation of the prismatic layer.29) MSI60 is a framework protein of the nacreous layer and expressed in the mantle pallial and pearl sac epithelium.30,31) Prismalin-14 is a matrix protein isolated from the acid-insoluble fraction of the prismatic layer of the shell. This protein plays an important role in regulating the calcification of the prismatic layer of the shell.32) Shematrin family proteins are basic matrix proteins that are exclusively associated with the prismatic layer and function as a framework for calcification in the shell prismatic layer.33) Perlucin is isolated from the abalone H. laevigata nacre and has an important role in regulating pearl formation and shell biomineralization.34) It also promotes calcium carbonate precipitation at ambient conditions, nucleates calcium carbonate crystallization, and modifies the morphology of calcium carbonate crystals.35) Perlwapin is isolated from the nacreous layer of the shell of the abalone H. laevigata and has whey acidic protein domains. Perlwapin may contribute to the growth inhibition in certain crystallographic planes in the mineral phase of polymer/mineral composite nacre.36) In this study, we detected perlwapin in the pearl oyster P. maxima. The results suggest that the transcriptome profiling method used in this study is effective in exploring genes associated with pearl or shell formation.

Downloaded by [National Pingtung University of Science and Technology] at 21:42 03 January 2015

Gene annotation, SSR, 1 Pinctada maxima

SSR development and validation As demonstrated in previous studies, transcriptome sequencing provides a large number of sequences for developing numerous EST–SSR markers. In this study, we conducted a general screening of pearl oyster transcripts for the presence of microsatellites and analyzed the distribution and frequency of these markers. A search for di- to hexa-nucleotide repeats yielded a total of 1764 SSRs in 1633 unigenes, accounting for approximately 1.50% (1633/108,704) of the total number of unigenes obtained. This percentage is consistent with the previously reported amplification ratios of 1.40–6.08% in other shellfish species.37–39) Considering the variety of SSR search tools and criteria used in different studies, the frequencies, types, and distributions of the potential SSRs are significantly distinct. In this study, di-nucleotide repeats were the most frequently obtained SSR motif type, followed by tri-nucleotide repeats. The results agree with the report of Jones et al.40) The most frequent motif types vary significantly among different studies. To determine the polymorphism level among our set of new genic SSR markers, we validated 56 primers in the stock. Among the 56 pair primers randomly selected for PCR validation, 36 produced clear bands. The PCR success rate (68.3%) was comparative with the results obtained in other shellfish species. For example, success rates was 65.0% in the clam Meretrix meretrix,15) 50.0% in the clam Mercenaria mercenaria,38) and 80.7% in the oyster Crassostrea virginica.39) Therefore, the 1764 potential EST–SSRs identified in this study will provide a valuable resource for developing EST–SSRs in the species. Transcriptome sequencing is a powerful tool for developing polymorphic DNA markers because of the enormous amount of sequence data and gene-based markers it provides.41) Such markers facilitate the detection of functional variation and signature of selection in genomic scans or associated genetic studies.42,43) The large number of SSRs detected in this study is useful in population genetics, linkage mapping and comparative genomics, quantitative trait loci identification, and so on.

Conclusion The number of assembled unigenes (108,704) and detected SSR markers (1764) derived from the pearl oyster transcriptome indicates that Illumina paired-end sequencing is a fast and cost-effective approach for novel gene discovery and molecular marker development in non-model organisms that lack a reference genome. The data-set obtained in this study will improve our understanding of the molecular mechanisms of pearl or shell formation. These SSR markers and primers may be used in the construction of a genetic linkage map and gene-based association studies.

Supplemental material The supplemental material for this paper is available at http://dx.doi.org/10.1080/09168451.2014.936351.

1691

Funding This study was supported by the Natural Science Foundation of China [31372526]; China Agriculture System [CARS-048].

References [1] Jiang YP, He MX. Survey of the studies on Pinctada maxima (Jameson). Mar. Sci. 2009;33:92–96. [2] Deng YW, Fu S, Liang FL, Xie SH. Effects of stocking density, diet, and water exchange on growth and survival of pearl oyster Pinctada maxima larvae. Aquacult. Int. 2013;21: 1185–1194. [3] Deng YW, Fu S, Liang FL, Du XD, Xie SH. Growth and survival of pearl oyster Pinctada maxima spat reared under different environmental conditions. J. Shellfish Res. 2013;32:675–679. [4] Ansorge WJ. Next-generation DNA sequencing techniques. New Biotechnol. 2009;25:195–203. [5] Liu TM, Zhu SY, Tang QM, Chen P, Yu YT, Tang SW. De novo assembly and characterization of transcriptome using Illumina paired-end sequencing and identification of CesA gene in ramie (Boehmeria nivea L. Gaud). BMC Genomics. 2013; 14:125. [6] Joubert C, Piquemal D, Marie B, Manchon L, Pierrat F, Zanella-Cléon I, Cochennec-Laureau N, Gueguen Y, Montagnani C. Transcriptome and proteome analysis of Pinctada margaritifera calcifying mantle and shell: focus on biomineralization. BMC Genomics. 2010;11:613. [7] Kinoshita S, Wang N, Inoue H, Maeyama K, Okamoto K, Nagai K, Kondo H, Hirono I, Asakawa S, Watabe S. Deep sequencing of ESTs from nacreous and prismatic layer producing tissues and a screen for novel shell formation-related genes in the pearl oyster. PLoS ONE. 2011;6:e21238. [8] \Zhao XX, Wang QH, Jiao Y, Huang RL, Deng YW, Wang H, Du XD. Identification of genes potentially related to biomineralization and immunity by transcriptome analysis of pearl sac in pearl oyster Pinctada martensii. Mar. Biotechnol. 2012;14:730–739. [9] Takeuchi T, Kawashima T, Koyanagi R, Gyoja F, Tanaka M, Ikuta T, Shoguchi E, Fujiwara M, Shinzato C, Hisata K, Fujie M, Usami T, Nagai K, Maeyama K, Okamoto K, Aoki H, Ishikawa T, Masaoka T, Fujiwara A, Endo K, Endo H, Nagasawa H, Kinoshita S, Asakawa S, Watabe S, Satoh N. Draft genome of the pearl oyster Pinctada fucata: a platform for understanding bivalve biology. DNA Res. 2012;19:117–130. [10] McGinty EL, Zenger KR, Jones DB, Jerry DR. Transcriptome analysis of biomineralisation-related genes within the pearl sac: host and donor oyster contribution. Mar. Genomics. 2012;5:27–33. [11] Shi Y, Yu C, Gu Z, Zhan X, Wang Y, Wang A, Wang Y. Characterization of the pearl oyster (Pinctada martensii) mantle transcriptome unravels biomineralization genes. Mar. Biotechnol. 2013;15:175–187. [12] Huang XD, Zhao M, Liu WG, Guan YY, Shi Y, Wang Q, Wu SZ, He MX. Gigabase-scale transcriptome analysis on four species of pearl oysters. Mar. Biotechnol. 2013;15:253–264. [13] Jones DB, Jerry DR, Forêt S, Konovalov DA, Zenger KR. Genome-wide SNP validation and mantle tissue transcriptome analysis in the silver-lipped pearl oyster, Pinctada maxima. Mar. Biotechnol. 2013;15:647–658. [14] Powell W, Machray GC, Provan J. Polymorphism revealed by simple sequence repeats. Trends Plant Sci. 1996;1:215–222. [15] Wang HX, Huan P, Lu X, Liu BZ. Mining of EST–SSR markers in clam Meretrix meretrix larvae from 454 shotgun transcriptome. Genes Genet. Syst. 2011;86:197–205. [16] Varshney RK, Graner A, Sorrells ME. Genic microsatellite markers in plants: features and applications. Trends Biotechnol. 2005;23:48–55. [17] Zane L, Bargelloni L, Patarnello T. Strategies for microsatellite isolation: a review. Mol. Ecol. 2002;11:1–16. [18] Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng QD, Chen

1692

[19]

[20]

Downloaded by [National Pingtung University of Science and Technology] at 21:42 03 January 2015

[21]

[22]

[23] [24]

[25]

[26]

[27]

[28]

[29]

[30]

Y. Deng et al. ZH, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma FD, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011;29:644–652. Pertea G, Huang XQ, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee YD, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003;19:651–652. Vogel JP, Gu YQ, Twigg P, Lazo GR, Laudencia-Chingcuanco D, Hayden DM, Donze TJ, Vivian LA, Stamova B, ColemanDerr D. EST sequencing and phylogenetic analysis of the model grass Brachypodium distachyon. Theor. Appl. Genet. 2006;113: 186–195. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. Cameron M, Williams HE, Cannane A. Improved gapped alignment in BLAST. IEEE ACM Trans. Comput. Biol. Bioinf. 2004;1:116–129. Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. Franchini P, Merwe M, Roodt-Wilding R. Transcriptome characterization of the South African abalone Haliotis midae using sequencing-by-synthesis. BMC Res. Notes. 2011;4:59. Novaes E, Drost D, Farmerie W, Pappas G, Grattapaglia D, Sederoff R, Kirst M. High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics. 2008;9:312. Wei W, Qi X, Wang L, Zhang Y, Hua W, Li D, Lv H, Zhang X. Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers. BMC Genomics. 2011;12:451. Fu N, Wang Q, Shen HL. De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.). PLoS ONE. 2013;8: e57686. Stephen M. Biomineralization: principles and concepts in bioinorganic materials chemistry. Oxford: Oxford University Press; 2001. Tsukamoto D, Sarashina I, Endo K. Structure and expression of an unusually acidic matrix protein of pearl oyster shells. Biochem. Biophys. Res. Commun. 2004;320:1175–1180. Zhang C, Xie LP, Huang J, Liu XL, Zhang RQ. A novel matrix protein family participating in the prismatic layer framework formation of pearl oyster, Pinctada fucata. Biochem. Biophys. Res. Commun. 2006;344:735–740.

[31] Sato Y, Inoue N, Ishikawa T, Ishibashi R, Obata M, Aoki H, Atsumi T, Komaru A. Pearl microstructure and expression of shell matrix protein genes MSI31 and MSI60 in the pearl sac epithelium of Pinctada fucata by in situ hybridization. PLoS ONE. 2013;8:e52372. [32] Suzuki M, Murayama E, Inoue H, Ozaki N, Tohse H, Kogure T, Nagasawa H. Characterization of Prismalin-14, a novel matrix protein from the prismatic layer of the Japanese pearl oyster, Pinctada fucata. Biochem. J. 2004;382:205–213. [33] Yano M, Nagai K, Morimoto K, Miyamoto H. Shematrin: a family of glycine-rich structural proteins in the shell of the pearl oyster Pinctada fucata. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 2006;144:254–262. [34] Mann K, Weiss IM, André S, Gabius HJ, Fritz M. The aminoacid sequence of the abalone (Haliotis laevigata) nacre protein perlucin. Eur. J. Biochem. 2000;267:5257–5264. [35] Blank S, Arnoldi M, Khoshnavaz S, Treccani L, Kuntz M, Mann K, Grathwohl G, Fritz M. The nacre protein perlucin nucleates growth of calcium carbonate crystals. J. Microsc. 2003;212: 280–291. [36] Treccani L, Mann K, Heinemann F, Fritz M. Perlwapin, an abalone nacre protein with three four-disulfide core (Whey Acidic Protein) domains, inhibits the growth of calcium carbonate crystals. Biophys. J. 2006;91:2601–2608. [37] Li HJ, Liu WD, Gao XG, Zhu D, Wang J, Li YF, He CB. Identification of host-defense genes and development of microsatellite markers from ESTs of hard clam Meretrix meretrix. Mol. Biol. Rep. 2011;38:769–775. [38] Wang Y, Wang AM, Guo XM. Development and characterization of polymorphic microsatellite markers for the Northern Quahog Mercenaria mercenaria (Linnaeus, 1758). J. Shellfish Res. 2010;29:77–82. [39] Wang Y, Guo XM. Development and characterization of ESTSSR markers in the eastern oyster Crassostrea virginica. Mar. Biotechnol. 2007;9:500–511. [40] Jones DB, Zenger KR, Jerry DR. In silico whole-genome EST analysis reveals 2322 novel microsatellites for the silver-lipped pearl oyster, Pinctada maxima. Mar. Genomics. 2011;4:287–290. [41] Lu FH, Yoon MY, Cho Y, Chung JW, Kim KT, Cho MC, Cheong SY, Park YJ. Transcriptome analysis and SNP/SSR marker information of red pepper variety YCM334 and Taean. Sci. Hortic-Amsterdam. 2011;129:38–45. [42] Vasemagi A, Nilsson J, Primmer CR. Expressed sequence taglinked microsatellites as a source of gene-associated polymorphisms for detecting signatures of divergent selection in Atlantic salmon (Salmo salar L.). Mol. Biol. Evol. 2005;22:1067–1076. [43] Bouck A, Vision T. The molecular ecologist’s guide to expressed sequence tags. Mol. Ecol. 2007;16:907–924.

De novo assembly, gene annotation, and simple sequence repeat marker development using Illumina paired-end transcriptome sequences in the pearl oyster Pinctada maxima.

We analyzed the mantle transcriptome of pearl oyster Pinctada maxima and developed EST-SSR markers using Illumina HiSeq 2000 paired-end sequencing tec...
486KB Sizes 0 Downloads 3 Views