MARGEN-00298; No of Pages 3 Marine Genomics xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Marine Genomics journal homepage: www.elsevier.com/locate/margen

1

Genomics/Technical resources

4Q2

Huaishun Shen a,b, Ruobo Gu a, Gangchun Xu a, Pao Xu a,b,⁎, Zijun Nie a, Yacheng Hu b

O

F

3

In-depth transcriptome analysis of Coilia ectenes, an important fish resource in the Yangtze River: de novo assembly, gene annotation

5 6 7

a

8

a r t i c l e

9 10 11 12 13

Article history: Received 10 January 2015 Received in revised form 5 March 2015 Accepted 5 March 2015 Available online xxxx

14 15 16 17 18

Keywords: Coilia ectenes Transcriptome Transcript Next generation sequencing

i n f o

R O

Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, PR China b Wuxi Fisheries College, Nanjing Agricultural University, Nanjing, 210095, PR China

a b s t r a c t

D

P

Coilia ectenes is an important teleost species in the Yangtze River and a model organism that can be used to study the protection of fish resources. In this report, we performed de novo transcriptome sequencing of ten cDNA libraries from the brain, gill, heart, intestine, kidney, liver, muscle, stomach, ovary, and testis tissues. A total of 352 million raw reads of 100 base pairs were generated, and 130,113 transcripts, corresponding to 65,350 non-redundant transcripts, with a mean length of 1520 bp, were assembled. BLASTx-based gene annotation (E-value b 1 × 10−5) allowed the identification of 73,900 transcripts against at least one of four databases, including the NCBI nonredundant database, the GO database, the COG database, and the KEGG database. Our study provides a valuable resource for C. ectenes genomic and transcriptomic data that will facilitate future functional studies of C. ectenes. © 2015 Published by Elsevier B.V.

31 29 28

C

30

1. Introduction

33 34

Coilia ectenes (Jordan and Seale, 1905), also known as Coilia nasus, and commonly known as the Japanese grenadier anchovy, is an important fishery resource in the Yangtze River. Indeed, C. ectenes is known locally as one of the “three Yangtze flavors”, along with Tenualosa reevesii (Reeve's shad) and Takifugu fasciatus (obscure pufferfish). C. ectenes is a member of the family Engraulidae, of the order Clupeiformes, that is widely distributed in the middle and lower Yangtze River and its affiliated lakes, where it is utilized in aquaculture (Cheng and Lu, 2005; Yuan et al., 1980). The C. ectenes harvest from the Yangtze River has decreased severely since the 1970s; annual production decreased from 3750 tons in 1973 to 12 tons in 2011 (Wei et al., 2012; Zhang et al., 2005). This situation is a result of multiple factors, including environmental pollution, overfishing, and other human activity. To protect C. ectenes in the Yangtze River, coordinated research across different fields is required, including population genetics, development and reproduction, physiology, and nutrition. Such research requires considerable background information with regard to the C. ectenes genome, but available data in this area is severely lacking. In recent years, next generation sequencing technologies have been developed to explore the genomes of model and non-model organisms. Compared to traditional Sanger sequencing technology, next generation

41 42 43 44 45 46 47 48 49 50 51 52 53

R

R

N C O

39 40

U

37 38

E

32

35 36

19 20 21 22 23 24 25 26 27

T

E

2Q1

⁎ Corresponding author at: Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, PR China. Fax: +86 510 85551474. E-mail address: [email protected] (P. Xu).

sequencing platforms are revolutionary in their ability to provide enormous amounts of sequence data with a greater breadth and depth of information (Metzker, 2010). Many fish have been studied using next generation sequencing technologies, including the common carp (Cyprinus carpio), the common sole (Solea solea), the Atlantic bonito (Sarda sarda), and the sea bass (Dicentrarchus labrax) (Ferraresso et al., 2013; Magnanou et al., 2014; Sarropoulou et al., 2014; Wang et al., 2012). These reports have advanced our understanding of fish genomes and gene functions. In this study, the transcriptomes of ten major organs in C. ectenes were sequenced using hi-seq sequencing technology. We believe that the data obtained from this study represent an import resource for C. ectenes research.

54 55

2. Data description

66

2.1. Ethics statement

67

This study was approved by the Animal Care and Use Committee of Freshwater Fisheries Research Center at the Chinese Academy of Fishery Sciences. All surgery was performed under sodium pentobarbital anesthesia, and all efforts were made to minimize suffering.

68

2.2. Illumina sequencing of ten C. ectenes tissue types

72

A total of twenty wild C. ectenes weighing approximately 100–200 grams each were collected from the Jiangyin section of the Yangtze River, Jiangsu Province, China. Ten tissue types (brain, gill, heart, intestine, kidney, liver, muscle, ovary, stomach, and testis) were collected,

73

http://dx.doi.org/10.1016/j.margen.2015.03.002 1874-7787/© 2015 Published by Elsevier B.V.

Please cite this article as: Shen, H., et al., In-depth transcriptome analysis of Coilia ectenes, an important fish resource in the Yangtze River: de novo assembly, gene annotation..., Mar. Genomics (2015), http://dx.doi.org/10.1016/j.margen.2015.03.002

56 57 58 59 60 61 62 63 64 65

69 70 71

74 75 76

Table 1 Statistics of sequence data of C. ectenes. Tissue

Raw reads

High quality reads

High quality base pair

Q20%

t1:4 t1:5 t1:6 t1:7 t1:8 t1:9 t1:10 t1:11 t1:12 t1:13 t1:14

Brain Gill Heart Intestine Kidney Liver Muscle Ovary Stomach Testis Total

28,396,948 41,482,612 27,294,702 33,390,964 38,243,102 31,221,336 44,608,948 50,227,728 34,244,828 23,451,624 352,562,792

22,215,632 30,116,114 23,262,350 26,841,796 32,666,502 25,036,136 37,914,734 42,630,072 28,606,202 19,793,306 289,082,844

2,200,213,803 2,981,778,522 2,303,606,393 2,660,884,952 3,233,466,575 2,481,654,472 3,760,280,270 4,224,722,540 2,835,747,734 1,956,828,568 28,639,183,829

99.44% 99.44% 99.36% 99.47% 99.33% 99.47% 99.38% 99.46% 99.46% 99.39%

2.4. Functional annotation and GO classification

105

Likely protein coding regions of the transcripts were predicted using Trinity software. Of the 130,113 transcripts, 72,297 transcripts were found to contain open reading frames (ORFs) (Supplementary file 4). These transcripts likely represent genes that play essential roles in C. ectenes biological processes. All transcripts were analyzed for functional annotation using BlastX and Blast2go with the NCBI NR database, GO database, KEGG database, and COG database with an e-value cutoff of 1 x 10−5. A total of 73,900 transcripts (56.8% of the total) had significant hits in these databases: 70,777 transcripts (54.4% of the total) in the NR database, 40,688 transcripts (31.27% of the total) in the GO database, 32,882 transcripts (25.27% of the total) in the KEGG database, and 15,055 transcripts (11.57% of the total) in the COG database (Supplementary file 4). Most of the annotated transcripts were found in ORFs, and 56,213 transcripts (43.2% of the total) produced no hits in any of the databases. It was shown that a total of 40,668 transcripts (16,626 nonredundant transcripts) matched at least one GO term in the three major categories that were considered: biological processes, molecular functions, and cellular components. The numbers of transcripts in each GO category are shown in Fig. 1. Genes with hits in the molecular

106 107

E

O

R O

C

T

and each group of tissues contained samples from approximately ten fish. Approximately 5 μg of DNase-treated total RNA which isolated from each tissue type was used to construct each cDNA library, and libraries were subjected to 100 bp paired-end (PE) sequencing using an Illumina HiSeq 2000 instrument. A total of 352,562,792 100 bp pairedend reads were generated, in which the maximum number of reads (50,227,728) was generated from the ovary library and the minimum number of reads (23,451,624) was generated from the testis library (Table 1). After removal of low quality reads, a total of 289,082,844 cleaned reads (81.99%) were obtained and were subjected to further analysis. All raw sequence data were deposited in the NCBI Sequence Read Archive (SRA) under accession code SRP043258.

R

87 88

65,350 130,113 197,809,642 1,520 31,889 351 2,319

R

85 86

Total genes Total transcripts Total residues (bp) Average length (bp) Largest transcript (bp) Smallest transcript (bp) N50 (bp)

O

83 84

t2:5 t2:6 t2:7 t2:8 t2:9 t2:10 t2:11

C

81 82

Number

N

79 80

Type

90 91

U

77 78

Table 2 Summary of Illumina Hiseq 2000 assembly and analysis of C. ectenes transcriptomic sequences.

t2:4

The cleaned reads were sequentially analyzed by the Trinity software (Grabherr et al., 2011). De novo assembly of 289,082,844 high-quality sequences generated a total of 130,113 transcripts, ranging from 351 to 31,889 bp in length (Table 2). The consensus sequences of the transcripts were supplied in Supplementary file 1. The average length was 1520 bp, most of the transcripts (21.13%) were 401–600 bp in length, and were followed in abundance by the 601–800 bp group (11.67%) and the 1–400 bp group (9.29%) (Supplementary file 2). These 130,113 transcripts yielded a total of 65,350 non-redundant transcripts because of alternative splicing. The expression levels of each transcripts in the ten studied C. ectenes tissues (brain, gill, heart, intestine, kidney, liver, muscle, ovary, stomach, and testis) were compared using FPKM values (Supplementary file 3). This library of C. ectenes transcripts represents a data source for future experiments, and significantly enriches our knowledge of the C. ectenes genome.

P

t2:1 t2:2 t2:3

89

F

t1:3

2.3. De novo assembly of the transcriptome

D

t1:1 t1:2

H. Shen et al. / Marine Genomics xxx (2015) xxx–xxx

E

2

Fig. 1. Gene ontology (GO) classification of transcripts from C. ectenes. GO terms were analyzed by Blast2Go and categorized at level 2 under three main categories (biological process, cellular component, and molecular function). In total, 40,688 transcripts with BLAST matches to known proteins were assigned to gene ontology terms.

Please cite this article as: Shen, H., et al., In-depth transcriptome analysis of Coilia ectenes, an important fish resource in the Yangtze River: de novo assembly, gene annotation..., Mar. Genomics (2015), http://dx.doi.org/10.1016/j.margen.2015.03.002

92 93 94 95 96 97 98 99 100 101 102 103 104

108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125

H. Shen et al. / Marine Genomics xxx (2015) xxx–xxx

133 134 135 136 137 138

139

Acknowledgments

140 141 142

This work was supported by a grant from the National Special Research Fund for Non-Profit Sector (201203065) and the National Key Technology R&D Program (2012BAD26B05).

143

References

144 145 146

Cheng, Q.Q., Lu, D.R., 2005. PCR-RFLP analysis of cytochrome b gene does not support Coilia ectenes taihuensis being a subspecies of Coilia ectenes. J. Genet. 84 (3), 307–310.

F

132

O

130 131

R O

128 129

Ferraresso, S., Bonaldo, A., Parma, L., Cinotti, S., Massi, P., Bargelloni, L., Gatta, P.P., 2013. Exploring the larval transcriptome of the common sole (Solea solea L.). BMC Genomics 14, 315. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., Regev, A., 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29 (7), 644–652. Magnanou, E., Klopp, C., Noirot, C., Besseau, L., Falcon, J., 2014. Generation and characterization of the sea bass Dicentrarchus labrax brain and liver transcriptomes. Gene 544 (1), 56–66. Metzker, M.L., 2010. Sequencing technologies - the next generation. Nat. Rev. Genet. 11 (1), 31–46. Sarropoulou, E., Moghadam, H.K., Papandroulakis, N., De la Gandara, F., Ortega Garcia, A., Makridis, P., 2014. The Atlantic Bonito (Sarda sarda, Bloch 1793) transcriptome and detection of differential expression during larvae development. PLoS One 9 (2), e87744. Wang, J.T., Li, J.T., Zhang, X.F., Sun, X.W., 2012. Transcriptome analysis reveals the time of the fourth round of genome duplication in common carp (Cyprinus carpio). BMC Genomics 13, 96. Wei, G., Xu, G., Gu, R., Xu, P., 2012. Research Development of Biology and Artificial Breeding in Coilia nasus (in Chinese). J. Yangtze Univ. (Nat. Sci. Edit.) 9 (7), 31–36. Yuan, C.M., Qing, A.L., Liu, R.H., 1980. On the classification of the anchovies, Coilia from the lower Yangtze River and the southeast coast of China. J. Nanjing Univ. 3, 67–82. Zhang, M., Xu, D., Liu, K., Shi, W., 2005. Studies on Biological Characteristics and Change of Resource of Coilia Nasus Schlegel in the Lower Reaches of the Yantze River (in Chinese). Resour. Environ. Yangtze Basin 14 (6), 694–698.

function ontology terms made up the largest group (33,073 transcripts, 81.28%), followed by the biological process terms (32,120 transcripts, 78.94%), and the cellular component terms (27,693 transcripts, 68.06%). Finally, two kinds of molecular markers, single-nucleotide polymorphisms (SNPs) and microsatellite sequences, or simple sequence repeats (SSRs), were generated from transcriptomic sequencing data from C. ectenes. SNPs are the most common type of variation in the genome, and 546,863 SNPs were obtained from our data using SAMtools and VarScan (Supplementary file 5). SSRs are polymorphic loci present in genomic DNA that consist of repeated core sequences of 2–6 bp in length. A total of 33,486 SSRs were initially identified in 130,113 C. ectenes transcripts (Supplementary file 6). Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.margen.2015.03.002.

P

126 127

3

U

N C O

R

R

E

C

T

E

D

174

Please cite this article as: Shen, H., et al., In-depth transcriptome analysis of Coilia ectenes, an important fish resource in the Yangtze River: de novo assembly, gene annotation..., Mar. Genomics (2015), http://dx.doi.org/10.1016/j.margen.2015.03.002

147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173

In-depth transcriptome analysis of Coilia ectenes, an important fish resource in the Yangtze River: de novo assembly, gene annotation.

Coilia ectenes is an important teleost species in the Yangtze River and a model organism that can be used to study the protection of fish resources. I...
694KB Sizes 0 Downloads 9 Views