Plant and Cell Physiology Advance Access published December 4, 2014

Sinbase: An Integrated Database to Study Genomics, Genetics, and Comparative Genomics in Sesamum indicum

Running head: Sesame genomics database

Corresponding author: Prof. Xiurong Zhang

Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, P.R.China. No.2, Xudong Second Road, Wuchang District, Wuhan, Hubei Province, the People's Republic of China. Postal code: 430062 Tel: 86-27-86811836, Fax: 86-27-86811836, Email: [email protected]

Subject areas: Genomics, Systems Biology and Evolution

Black and white figures: 1; Colour figures 2; Tables: 1; Supplementary Tables: 1;

© The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For Permissions, please e-mail: [email protected] 1

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of

Sinbase: An Integrated Database to Study Genomics, Genetics, and Comparative Genomics in Sesamum indicum Running head: Sesame genomics database

Linhai Wang

, Jingyin Yu

1, †

1

1,

, Donghua Li , Xiurong Zhang *

Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of

Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, P.R.China.



These authors contributed equally to this work.

* Corresponding author: [email protected].

Abbreviations: SNP: Single Nucleotide Polymorphism SSR: Simple Sequence Repeat InDel: Insertion and Deletion

2

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

1

1, †

Abstract: Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus), and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their

reference genome information. The now-available S. indicum genome will provide an unprecedented opportunity for studying S. indicum genetic traits and comparative genomics alike. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a Webbased database with comprehensive sesame genomics, genetics, and comparative genomics information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167), and noncoding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera, and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation, and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/.

Keywords: Comparative genomics • Database • Genetics • Genomics • Sesamum indicum.

3

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

importance, genetic and genomic analyses on these species have been insufficient due to a lack of

Introduction Sesame (Sesamum indicum L.), an ancient and important oilseed crop, has earned the poetic label "queen of oilseeds" due to the high oil yield and quality of the seed. The higher oil content and fewer lipid-related genes in the small and diploid genome of sesame make it an invaluable plant model for studying oil biosynthesis (Wang et al. 2014). In addition, sesame oil has an excellent composition:

~1:1) (Wei et al. 2013), yet it has an outstanding shelf life due to the presence of anti-oxidative furofuran lignans (Budowski et al. 1950). The major sesame lignan, sesamin, has been a focus of pharmacologists and was demonstrated to possess many potent pharmacological properties such as decreasing blood lipids (Hirata et al. 1996) and lowering cholesterol levels (Chen et al. 2005). Sesame (2n = 2x = 26) is a member of the family Pedaliaceae and order Lamiales (The Angiosperm Phylogeny 2009). This order includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus), lavender (Lavandula spica), and basil (Ocimum basilicum). Most of these species have been lacking in genetic and genomic studies. Recently, with high-throughput sequencing technology becoming routine, a relatively advanced study in sesame molecular biology was performed. Several simple sequence repeat (SSR) markers have been developed in sesame (Spandana et al. 2012; Wang et al. 2012a; Zhang et al. 2012). The loci associated with indehiscent capsule trait, definite growth habit, and seed coat color have been detected (Uzun and Çağırgan 2009; Uzun et al. 2003; Zhang et al. 2013), and the expression levels of sesame genes have been explored using Sanger and high-throughput DNA sequencing technologies (Suh et al. 2003; Wang et al. 2012b; Wei et al. 2011). The phylogenetic position of sesame has been

4

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

more than 80% is made up of reputable unsaturated fatty acids (mainly oleic acid and linoleic acid,

determined using chloroplast genomic data, which indicated the core lineage of Sesamum in Lamiales (Yi and Kim 2012). More specifically, the completed de novo assembly of the cultivar ‘Zhongzhi No.13’ genomic sequence provided new insights into sesame. Sesame was estimated to have diverged from the Lamiales species Utricularia gibba ~98 million years ago (68.6–145.2 MYA) and from the tomato–potato lineage ~125 MYA (89.8–185.8 MYA). Its recent whole genome duplication

tomato–potato lineage (Wang et al. 2014). Until now, few public resources have existed for sesame genome data excluding the sesame genome project Web site (http://www.sesamegenome.org/) and National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/) (Pruitt et al. 2007). However, the sesame genome project Web site did not supply detailed sesame genomic or genetic data sets at the time of writing. With respect to sesame, the NCBI presents only partial and less-than-systematic data, such as genome survey sequences (GSSs), expressed sequence tag sequences (ESTs), and transcriptome data in Gene Expression Omnibus (GEO) data sets (Barrett et al. 2009). To systematically visualize and understand sesame genomics and genetics, we developed a comprehensive Web-based database, referred to as Sinbase (http://ocri-genomics.org/Sinbase), which includes genomic, genetic, and comparative genomic information. This user-friendly database is the primary genomic resource for Sesamum, which will serve as an infrastructure for researchers to study the molecular function of genomic components, comparative genomics, and evolution in related Lamiales species. In this study, we also present an overview of the major sections of Sinbase and some of the tools available for data mining and database searches (Fig. 1).

5

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

(WGD) event was inferred to occur independently in the parallel period of the triplication event in the

[Figure 1 approximately here] Database implementation We developed an integrated database of genomic, genetic, and comparative genomic information for the sesame genome. It is currently hosted on a CentOS 5.4 Linux operating system with the Apache Web server. The user-friendly interface was developed using Perl/CGI, Python, and JavaScript.

pseudomolecular chromosomes, and gene structures were generated using the SVG software package and Perl GD module from the Comprehensive Perl Archive Network (CPAN) Web site (Hietaniemi 2008). Basic sesame genome data sets were analyzed using in-house Perl scripts, and all data sets related to the sesame genome were stored in a MySQL rational database management system. GBrowse 2.0 and GBrowse_syn software packages were adapted for visualization of the distribution of genomic components on pseudomolecular chromosomes and relationships colinear to the sesame genome between species (Ashburner et al. 2000; Donlin 2009; McKay et al. 2010).

Database contents and functions Currently, Sinbase has collected 16 genetic linkage groups (pseudomolecular chromosomes) of sesame anchored by 406 genetic markers including single-nucleotide polymorphisms (SNPs), SSRs, and insertion/deletion (InDel) genetic markers, as well as 16,296 scaffolds. Based on these genomic data sets, Sinbase includes all genomic components generated by open-resource software with stringent parameters comprising 27,148 predicted protein-coding genes, 372,167 transposable elements, and 1,748 noncoding RNAs (Table 1). With regard to sesame genetics, Sinbase supplies

6

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

Graphical compositions of pseudomolecular chromosomes, distribution of genetic markers on

not only the genetic distribution of the 406 DNA markers on pseudomolecular chromosomes, but also their basic information. Moreover, Sinbase offers a comprehensive analysis of comparative genomics, which are displayed using Circos software (Krzywinski et al. 2009) between S. indicum and Arabidopsis thaliana (Huala et al. 2001), Glycine max (Schmutz et al. 2010), Vitis vinifera (Jaillon et al. 2007), and Solanum lycopersicum (The Tomato Genome Consortium 2012) using genomic data from

[Table 1 approximately here]

Genomic component Three typical sesame genomic components, including putative genes (27,148), transposable elements (372,167), and noncoding RNA (1,748), were collected for Sinbase. For each putative gene, Sinbase supplies the functional description, genome analysis and sequences for the sesamepredicted gene. Repetitive elements in sesame are identified employing four methods including de novo RepeatMasker (ab initio prediction repeats), Proteinmask (prediction by Repeat Protein Mask), RepeatMasker (prediction by RepeatMasker; Repbase is the default library file), and Tandem Repeats Finder (TRF, especially for the analysis of simple repeated sequences) (Wang et al. 2014), Then the four sets of repetitive elements were deposited by each linkage group without merging, and all of them were provided with sequences, positions and types. According to specified characteristics of conserved motifs, conserved domains, and sequence similarities of noncoding RNAs, 1,748 noncoding RNAs were identified in the genome and compiled into Sinbase, leading to a total of 207

7

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

freely available databases.

microRNAs (miRNAs), 386 ribosomal RNAs (rRNAs: 18S, 28S, 5.8S, and 5S), 268 small nuclear RNAs (snRNAs: CD-box, HACA-box, and splicing), and 870 transfer RNAs (tRNAs) (Fig. 2).

[Figure 2 approximately here]

The sesame genome is estimated to be ~357 Mb in size (Wang et al. 2014). Using the Illumina Hiseq2000 platform, the sesame genome was sequenced and assembled based on paired-end short reads. In total, 274 Mb (77.4% of the estimated genome) of the genome was obtained, which contained 14,595 unique contigs and 1,851 scaffolds with contig N50 of 52.2 kb and scaffold N50 of 2.1 Mb. The 150 larger scaffolds (~85.3% of the assembled sesame genome) were successfully anchored into 16 genetic linkage groups using a newly constructed genetic map. In Sinbase, the genetic map was compiled and displayed for the 406 critical genetic markers distributed in 251 SNPs, 116 SSRs, and 39 InDel genetic markers.

Comparative genomics To illustrate the evolutionary relationship and species divergence between sesame and other remarkable species or its relatives, such as A. thaliana, G. max, S. lycopersicum, and V. vinifera at both the inter- and intragenomic level, whole-genome comparative analyses were performed, and the colinear regions between sesame and other species were generated using MCscan software (Tang et al. 2008). Sinbase collected 1,058 colinear blocks, including 21,506 orthologous gene pairs among

8

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

Genetic linkage groups

A. thaliana, G. max, S. lycopersicum, and V. vinifera genomes compared to the S. indicum genome (Supplementary Table 1). Using MCscan software, orthologous gene pairs were detected based on an all-against-all BLAST search (Altschul et al. 1997) using protein sequences with a strict parameter: E-value cutoff ≤1e-05 between species. The colinear regions were scanned with the following parameters, e = 1e-20, u = 1, and s = 5, which required a minimum of five consecutive orthologous

indicum compared to A. thaliana, 377 in S. indicum compared to G. max, 358 in S. indicum compared to V. vinifera, and 420 in S. indicum compared to S. lycopersicum (Fig. 3).

[Figure 3 approximately here]

Utility of Sinbase Sinbase is the first comprehensive and integrated resource platform for sesame genomics, genetics, and comparative genomics and provides a user-friendly interface for easy access to sesame genome information. Two critical functional modules were designed in Sinbase: 'Browse' and 'Search' modules. From the 'Browse' module, users can browse sesame phenotypes, sesame genome components, genetics linkage groups, and colinear relationships among species in comparison to sesame genome, as well as the distribution of genome components on sesame pseudomolecular chromosomes shown by GBrowse (Donlin 2009). According to different sections in this module, internal links were equipped to help users maximise on the available information. The 'Search' module was developed to help users extract sesame genome information, including genome components and corresponding

9

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

gene pairs in colinear regions. Ultimately, 1,058 colinear regions were detected including 353 in S.

sequences, orthologous genes of sesame in corresponding species, and colinear regions between sesame and corresponding species on pseudomolecular chromosomes. The sequence similarity search function using a standard NCBI BLAST program (Altschul et al. 1997) was included in this module for the user to query nucleotide or protein sequences against a database containing wholegenome data sets of S. indicum, A. thaliana, G. max, V. vinifera, and S. lycopersicum. Data from the

Browse Users can browse information on five categories: phenotypes, genome components, genetic linkage map, comparative genomics, and Genome Browse. The ‘phenotypes’ tab was designed to exhibit the phenotype variation of sesame, and help users in other fields get perceptual knowledge about sesame quickly. It collected a variety of sesame with typical phenotypes, including those with indefinite or definite growth habits, uniculm or branching styles, single or multiple flowers per axil, single or multiple capsules per axil, two or more carpels in capsule, opened or closed corolla, normal or dwarf plant, and those with different flower color, capsule color, seed coat color, capsule size, seed size so on. For further information, please refer to the file “Descriptors for sesame”, which is freely accessible

at

http://www.bioversityinternational.org/e-library/publications/detail/descriptors-for-

sesame-emsesamumem-spp/. For the ‘genome components’ navigation tab, a comprehensive annotation was compiled for three key components of sesame genomics. Putative genes are classified by basic information, genome analysis and sequences, as well as annotation by Gene Ontology (GO) (Ashburner et al. 2000), InterPro (Quevillon et al. 2005), Swiss-Prot, TrEMBL

10

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

sesame genome collected in Sinbase are freely available in bulk.

(O'Donovan et al. 2002), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al. 2008). Each repeat element class is organized into different pseudomolecular chromosomes and is shown as a list in table format. For every repeat element entry, Sinbase offers basic information, including methods on how to get this entry, sequence of the entry, and type that the entry belongs to, as well as parameters to generate the entry. Similar to repeat element entry, basic information on

genetic linkage groups of sesame genetic markers, which allows users to understand them intuitively. According to what the genetic linkage groups exhibit, users can retrieve the genetic distances and order of those markers, as well as the corresponding scaffolds anchored on pseudomolecular chromosomes. Furthermore, users can obtain basic information, including marker type, physical location on pseudomolecular chromosomes, and flanking sequences for each sesame genetic marker, which could be employed by genetic breeders and researchers to further their knowledge underlying special traits and phenotypes. For comparative genomics, the relationships between S. indicum and A. thaliana, G. max, V. vinifera, and S. lycopersicum genomes are curated in this section, and their colinear regions are displayed using Circos software (Krzywinski et al. 2009). Statistical analyses of gene pairs between colinear regions among different species are scatterplotted by the R programming language. GBrowse_syn is used to display every colinear region between two species to offer users a dynamic and intuitive description (Donlin 2009). Moreover, sequence identities between orthologous gene pairs in colinear regions are displayed in gene pair lists. GBrowse was employed to show the distribution of genome components of the sesame genome on pseudomolecular chromosomes (McKay et al. 2010).

11

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

noncoding RNAs are also supplied. Sinbase also supplies graphics regarding the distribution on

Search Multiple-function search modules were designed to meet a variety of needs for researchers to extract interesting information on the sesame genome in the plant science community. Sinbase search functions refer to three parts according to sesame genome resources deposited in the database. First,

users a multichannel search function to obtain sesame genomic component information. This function can be used to access annotations and sequences of putative genes, transposable elements, and noncoding RNAs. Second, the orthologous genes search panel is an important resource to obtain sesame orthologs in other species. This function is useful for researchers who want to investigate homolog cloning, homolog comparison, and evolution at the gene level. Last, the colinear regions search function provides homologous regions from the sesame genome to other species. Based on whole pseudomolecular chromosomes, colinear blocks can be visualized as a comparative chromosomal graphic, and lists of colinear regions are shown along with their chromosomal position. This function can aid researchers in their understanding of the colinear and evolutionary relationships between the sesame genome and corresponding species efficiently.

BLAST The similarity sequence search function, which embeds a standard NCBI BLAST software package (Altschul et al. 1997), will assist users in retrieving homologous genome components or regions in Sinbase. Users can upload a sequence file or directly paste the nucleic acid or amino acid

12

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

the genomic component search accounts for a large proportion of search functions, which can give

sequence(s) into a box to search against all available databases by choosing the correct BLAST program (depending on the sequence type). Thus, this function aids users in extracting annotations of query sequences using the sesame genome data compiled in Sinbase by quick match. By inputting a nucleic acid or amino acid sequence, the similarity sequence search function can also allow users to find the location of query sequences on scaffolds, genetic linkage groups, or pseudomolecular

Discussion Few public resources exist on sesame genome information, apart from a sesame genome project Web site and certain transcriptomic data and genome survey sequences from the NCBI (Pruitt et al. 2007). Sinbase is the first practical and integrated database that focuses on S. indicum genomics, genetics, and comparative genomics with its relatives and other important species in the plant community. The deposited genomic sequences and relatively accurate genomic component annotations will be convenient for users to study functional genomics and comparative genomics of sesame simultaneously. Compared to another public resource platform for S. indicum genomic data (http://www.sesamegenome.org/), Sinbase supplies more comprehensive genomic data sets and

detailed genomic component annotations to allow users to study sesame more thoroughly. Genetic linkage groups, which can be explored by a highly visual style, will be valuable resource for gene cloning, quantitative trait locus (QTL), and genetic linkage analyses. Colinear blocks and orthologous genes will be useful resources for researchers to perform comparative and evolutionary analyses with sesame and other species. Additionally, the user-friendly interface allows users to conveniently

13

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

chromosomes.

access Sinbase. This database will be improved and updated continuously with new features, along with improvements to genome annotations and genomic sequences in the sesame genome.

Conclusions and perspective To facilitate functional and comparative genomic studies in sesame, we developed a comprehensive

from sesame. Genomic sequences, genetic linkage groups, and colinear regions have been compiled into an integrated sesame genome database. Sinbase consists of two primary functions: a ‘browse’ section to show data collected in the database and multifunction search systems to display search results. We hope that Sinbase (a comprehensive data resource) will contribute meaningfully not only to functional genomics research, but also to evolutionary biology and molecular breeding.

Funding information

This work was supported by the National Natural Science Foundation of China [no. 31101182], the National Basic Research Program of China [973 Program, no. 2011CB109304], and the National Crop Germplasm Resources Infrastructure [NB2012-2130135].

Disclosures

The authors have no conflicts of interest to declare.

14

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

database of the S. indicum genome, which includes genomic, genetic, and comparative genomic data

References

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., et al. (2000) Gene

Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., et al. (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 37: D885-D890. Budowski, P., Menezes, F.G. and Dollear, F. (1950) Sesame oil. V. The stability of sesame oil. J. Am. Oil Chem. Soc. 27: 377-380. Chen, P.R., Chien, K.L., Su, T.C., Chang, C.J., Liu, T.-L., Cheng, H., et al. (2005) Dietary sesame reduces serum cholesterol and enhances antioxidant capacity in hypercholesterolemia. Nutr Res 25: 559-567. Donlin, M.J. (2009) Using the generic genome browser (GBrowse). Current Protocols in Bioinformatics: 9.9. 1-9.9. 25. Hietaniemi, J. (2008) Comprehensive perl archive network. Helsinki, Finland: CPAN 6. Hirata, F., Fujita, K., Ishikura, Y., Hosoda, K., Ishikawa, T. and Nakamura, H. (1996) Hypocholesterolemic effect of sesame lignan in humans. Atherosclerosis 122: 135-136. Huala, E., Dickerman, A.W., Garcia-Hernandez, M., Weems, D., Reiser, L., LaFond, F., et al. (2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based

15

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25-29.

information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res 29: 102-105. Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande, A., et al. (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: 463-U465.

genomes to life and the environment. Nucleic Acids Res 36: D480-484. Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., et al. (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19: 1639-1645. McKay, S.J., Vergara, I.A. and Stajich, J.E. (2010) Using the generic synteny browser (GBrowse_syn). Current Protocols in Bioinformatics: 9.12. 11-19.12. 25. O'Donovan, C., Martin, M.J., Gattiker, A., Gasteiger, E., Bairoch, A. and Apweiler, R. (2002) Highquality protein knowledge resource: SWISS-PROT and TrEMBL. Brief Bioinform 3: 275-284. Pruitt, K.D., Tatusova, T. and Maglott, D.R. (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35: D61-D65. Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., et al. (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33: W116-120. Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., et al. (2010) Genome sequence of the palaeopolyploid soybean. Nature 463: 178-183.

16

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., et al. (2008) KEGG for linking

Spandana, B., Reddy, V.P., Prasanna, G.J., Anuradha, G. and Sivaramakrishnan, S. (2012) Development and characterization of microsatellite markers (SSR) in Sesamum (Sesamum indicum L.) species. Appl. Biochem. Biotechnol. 168: 1594-1607. Suh, M.C., Kim, M.J., Hur, C.G., Bae, J.M., Park, Y.I., Chung, C.H., et al. (2003) Comparative analysis of expressed sequence tags from Sesamum indicum and Arabidopsis thaliana developing

Tang, H., Bowers, J.E., Wang, X., Ming, R., Alam, M. and Paterson, A.H. (2008) Synteny and collinearity in plant genomes. Science 320: 486-488. The Angiosperm Phylogeny, G. (2009) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot. J. Linn. Soc. 161: 105-121. The Tomato Genome Consortium (2012) The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485: 635-641. Uzun, B. and Çağırgan, M.İ. (2009) Identification of molecular markers linked to determinate growth habit in sesame. Euphytica 166: 379-384. Uzun, B., Lee, D., Donini, P. and ÇAǧirgan, M. (2003) Identification of a molecular marker linked to the closed capsule mutant trait in sesame using AFLP. Plant Breeding 122: 95-97. Wang, L., Yu, S., Tong, C., Zhao, Y., Liu, Y., Song, C., et al. (2014) Genome sequencing of the high oil crop sesame provides insight into oil biosynthesis. Genome Biol. 15: R39. Wang, L., Zhang, Y., Qi, X., Gao, Y. and Zhang, X. (2012a) Development and characterization of 59 polymorphic cDNA-SSR markers for the edible oil crop Sesamum indicum (Pedaliaceae). Am J Bot 99: e394-398.

17

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

seeds. Plant Mol Biol 52: 1107-1123.

Wang, L., Zhang, Y., Qi, X., Li, D., Wei, W. and Zhang, X. (2012b) Global gene expression responses to waterlogging in roots of sesame (Sesamum indicum L.). Acta Physiol. Plant. 34: 2241-2249. Wei, W., Qi, X., Wang, L., Zhang, Y., Hua, W., Li, D., et al. (2011) Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers. BMC Genomics 12: 451.

traits in a diverse panel of Chinese sesame (Sesamum indicum L.) germplasm. J Integr Plant Biol 55: 745-758. Yi, D.K. and Kim, K.J. (2012) Complete chloroplast genome sequences of important oilseed crop Sesamum indicum L. PLoS One 7: e35872. Zhang, H., Miao, H., Wei, L., Li, C., Zhao, R. and Wang, C. (2013) Genetic analysis and QTL mapping of seed coat color in sesame (Sesamum indicum L.). PLoS One 8: e63898. Zhang, H., Wei, L., Miao, H., Zhang, T. and Wang, C. (2012) Development and validation of genicSSR markers in sesame by RNA-seq. BMC Genomics 13: 316.

18

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

Wei, W., Zhang, Y., Lu, H., Li, D., Wang, L. and Zhang, X. (2013) Association analysis for quality

Tables Table 1 Statistics of S. indicum genome components in Sinbase Categories

Putative genes

Repeat elements

Noncoding RNAs

Total genome components

LG01

2,024

18,719

89

20,832

LG02

1,934

18,852

77

20,863

LG03

2,709

35,743

122

38,574

1,679

19,264

59

21,002

1,679

34,056

93

35,828

LG06

2,745

36,526

123

39,394

LG07

1,260

14,310

162

15,732

LG08

2,535

18,712

143

21,390

LG09

1,300

12,814

69

14,183

LG10

1,792

24,803

127

26,722

LG11

1,617

17,892

81

19,590

LG12

1,017

4,066

46

5,129

LG13

522

7,769

36

8,327

LG14

524

3,178

33

3,735

LG15

1,187

15,229

112

16,528

LG16 Unknown

372 2,252

7,033 83,201

25 351

7,430 85,804

19

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

LG04 LG05

Legends to figures

Fig. 1 Flowchart of the Sinbase sitemap.

Fig. 2 Annotation of genomic components in the S. indicum genome. A. Overview of genomic components in Sinbase. B. Basic information on pseudomolecular chromosomes. C. Genome browse of pseudomolecular chromosomes. D. Putative gene list on pseudomolecular chromosomes. E. Repeat element list on pseudomolecular chromosomes. F. List of

of repeat elements. I. Annotation of noncoding RNAs.

Fig. 3 Colinear regions of S. indicum compared to corresponding genomes. A. Colinear regions displayed by Circos software. B. Colinear regions list. C. Certain colinear regions shown by GBrowse_syn software. D. Orthologous gene pairs on colinear regions.

20

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

noncoding RNAs on pseudomolecular chromosomes. G. Annotation of putative genes. H. Annotation

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

11x6mm (600 x 600 DPI)

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

143x94mm (300 x 300 DPI)

Downloaded from http://pcp.oxfordjournals.org/ at Monash University on December 7, 2014

78x55mm (300 x 300 DPI)

Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order ...
508KB Sizes 0 Downloads 6 Views