Plant and Cell Physiology Advance Access published November 28, 2014

MTGD: The Medicago truncatula genome database

1

1

1

1

1

Vivek Krishnakumar , Maria Kim , Benjamin D. Rosen , Svetlana Karamycheva , Shelby L. Bidwell , 1

1

Haibao Tang , Christopher D. Town

1

Plant Genomics Group, J. Craig Venter Institute, 9704 Medical Center Dr, Rockville, MD 20850, USA

BLAST, Basic Local Alignment Search Tool; CDS, Coding Sequence; eFP, Electronic Fluorescent Pictograph; EST, Expressed Sequence Tag; iDS, iPlant Data Store; GFF3, Generic Feature Format 3; GMOD, Generic Model Organism Database; GO, Gene Ontology; JCVI, J. Craig Venter Institute; JSON, JavaScript Object Notation; KEGG, Kyoto Encyclopedia of Genes and Genomes; MODs, Model Organism Databases; MVC, Model View Controller; REST, Representational State Transfer; SIFTER, Statistical Inference of Function Through Evolutionary Relationships; TIGR, The Institute for Genomic Research; WGS, Whole Genome Shotgun

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

Abbreviations:

Abstract

Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. JCVI (formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The web site (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the

via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant “mines” like ThaleMine and PhytoMine, and other Model Organism Databases (MODs). In addition to these new features, we continue to provide keyword and locus identifier based searches served via a Chado-backed Tripal Instance, a BLAST search interface, and bulk downloads of datasets from the iPlant Data Store (iDS). Finally, we maintain an email helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific datasets from the community.

Keywords

Medicago truncatula, legume, genome sequence, database, visualization, search

© The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For Permissions, please e-mail: [email protected]

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

latest version of the genome (Mt4.0), associated data and legacy project information, presented to users

Introduction

Leguminous plants, which are members of the Fabaceae family, are primarily cultivated for food grain seed and livestock forage, and are considered to be good sources of protein for the diet, since their protein content ranges from 20-40%. Prominent among crop legumes are soybean (Glycine max) which is a major cash crop providing protein and oil worldwide and alfalfa (Medicago sativa), the most extensively

to fix atmospheric nitrogen through the formation of root nodules harboring the bacterial symbiont, Sinorhizobium meliloti. Legumes cultivated worldwide fix over 50 million metric tons of nitrogen annually, equivalent to $10 billion of fertilizer.

Two species, Lotus japonicus and Medicago truncatula, have played prominent roles in fundamental research on legume biology and symbiotic nitrogen fixation due to favorable characteristics including diploid genetics, small genomes (~500 Mbp), ease of transformation, short lifecycle and high levels of natural diversity (Cook 1999; Cook et al. 1997). The genomes of both these models were sequenced in the first decade of the 21st century along with the commercially valuable soybean (Sato et al. 2008; Schmutz et al. 2010; Young et al. 2011). Collectively, these three genomes have provided researchers with solid platforms upon which to base their research. Thus, enabling access to a high-quality Medicago reference genome and annotation creates a solid foundation for legume researchers. These resources have served as enabling platforms for gene expression atlases (Benedito et al. 2008; Libault et al. 2010; Verdier et al. 2013), mutant collections and databases (Perry et al. 2003; Tadege et al. 2008) and as references for many other legume species with less well-studied genomes (Mochida and Shinozaki 2011).

Sequencing of the euchromatic (gene-rich) portion of the M. truncatula genome, backed by the National Science Foundation (NSF) and the European Union’s Sixth Framework Program, began in 2003. Of the 8 chromosomes, six were sequenced in the US and two were sequenced in Europe (chromosome 5 by Genoscope in France and chromosome 3 by Sanger Centre in the UK). A Sanger-based BAC-by-BAC

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

cultivated forage legume in the world. Legumes are particularly important in agriculture due to their ability

assembly totaling ~250 Mbp in sequence, supplemented with an additional ~100 Mbp of short contigs derived from de novo assembled Illumina reads, was constructed and published as the first release of the complete genome (version Mt3.5), the annotation and analysis of which was orchestrated by the International Medicago Genome Annotation Group (IMGAG, http://www.medicago.org/genome/IMGAG)

(Young et al. 2011).

Next Generation Sequencing (NGS) technologies, and its annotation with the support of the research community. This resulted in a much-improved hybrid genome assembly, consisting of a whole genome shotgun (WGS) based backbone, reconciled using high quality regions from the BAC-based Mt3.5 assembly. Optical mapping technology (Zhou et al. 2007) in combination with genetic maps generated via Genotyping-by-sequencing (GBS) (Elshire et al. 2011) were used to recruit sequences for the construction of the chromosome pseudomolecules, now spanning over ~384 Mbp plus an unanchored sequence set of ~28 Mbp. This hybrid genome assembly was annotated in-house at JCVI, using new ab initio prediction methods, and more transcriptomic and proteomic evidence (Volkening et al. 2012) Annotations were generated using Evidence Modeler (Haas et al. 2008) and MAKER (Cantarel et al. 2008), to improve existing gene calls and instantiate novel genes, supplemented with custom curated gene sets from community collaborators (Zhou et al. 2013). This resulted in the Mt4.0 release (Tang et al. 2014) consisting of 50,894 annotated gene loci (classified into 31,661 high confidence and 19,233 low confidence genes), which was made available via GenBank (as an update to the Mt3.5 release) and our website.

This paper describes a feature rich genome resource for Medicago truncatula, which exposes a variety of useful web-based tools and services to query, consume and visualize the data with the continuing goal of supporting

the

legume

research

community.

Analogous

(http://www.kazusa.or.jp/lotus) and G. max (http://soybase.org).

resources

exist

for

L.

japonicus

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

Additional funding from the NSF supported efforts aimed at completing and curating the genome using

Database contents

MTGD integrates a rich set of current generation open-source web-based tools primarily developed by the GMOD community, to facilitate easy and user-friendly access to the data. These include a data query engine based on Tripal (Sanderson et al. 2013) which uses a Chado database, genome visualization facilitated by JBrowse (Skinner et al. 2009), sequence search capabilities using a BLAST server (Altschul

tailored implementation of EuCAP (Thibaud-Nissen et al. 2007) and WebApollo (Lee et al. 2013) and access to processed bulk datasets (FASTA, GFF and other data files) served from the iDS (Goff et al. 2011).

We have also implemented MedicMine, an InterMine (Smith et al. 2012) interface to Medicago truncatula genome data, which provides a powerful query interface, exposes RESTful web services and supports interoperability with other plant “mines” (currently ThaleMine and PhytoMine) and Model Organism Databases (MODs) running InterMine.

The above stack of applications operates on a CentOS 6.5 based system under an Apache web server (Tomcat for the MedicMine service) and uses PostgreSQL (MySQL for the EuCAP service) as the underlying database management system. By working with this software stack and choosing to integrate the genomic datasets into a community-derived open-source database schema, MTGD ensures that the researchers within the Medicago and wider legume community have access to familiar query and visualization interfaces, adopted by other well known plant genomics resources such as Phytozome (Goodstein et al., 2012). This ensures consistency and portability of data, promotes interoperability, guarantees future maintainability and in general, fosters open-source development.

Interrogating the Medicago genome data via a query interface

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

et al. 1990), literature mining using Textpresso (Müller et al. 2004), a community annotation gateway via a

In the earlier phases of this project, the genomic data generated was hosted and served to the users via a custom, in-house developed (TIGR/JCVI specific) eukaryotic database schema and query interface. In the recent past, this data was migrated into Tripal (Figure 1A), an interface combining the capabilities of the Drupal content management system (https://www.drupal.org) and data served from Chado, an open-

source database schema used for storage of genomic, genetic and related biological data. Both the Drupal and the Chado database are maintained using the PostgreSQL backend server architecture while

Every entity associated with the genome (organism, gene, analysis, etc.) is represented as a page (or node) in Tripal. Each gene report (Figure 1B) page provides the user with a holistic view of the metadata associated with the entity (gene description, Affy probeset mapping, locus identifier), a sequence viewer (Figure 1C) with its different genomic elements (mRNA, exon, CDS) highlighted using unique colors, and analysis results. These include protein domain composition identified by InterProScan (Figure 1D), BLAST homology search results (Figure 1E) against the NCBI non-redundant (nr) database, GO Annotations (Figure 1F) inferred independently from InterPro2GO (Hunter et al. 2009) & the PhyloFun pipeline (Hallab and Schoof 2012) which uses the SIFTER phylogenomic tool (Engelhardt et al. 2005) to propagate GO terms through statistical inference and KEGG pathways (Kanehisa and Goto 1999) (Table 1). The full-text search capabilities of Drupal can be used to query and peruse the database content by locus identifier (example: “Medtr5g024510”) or keywords (gene description, GO terms, IPR domains, etc.).

The BLAST search interface provides access to datasets from the current genome release Mt4.0 by default (the earlier Mt3.5 version is also made available for users needing access to legacy data), allowing users to conduct sequence searches against the coding sequences, transcripts, proteins, and the whole genome assembly, the results of which are presented in a familiar graphical format as well as a full text alignment.

Visualizing the Medicago genome

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

the front-end pages are served via custom PHP scripts.

Users who wish to visualize pre-computed genomic datasets (gene structures, orthologs from related plant species, EST/Protein alignment evidence, etc.) could make use of the powerful web-based genome visualization tool, JBrowse. In contrast to its predecessor, GBrowse (Stein 2013), JBrowse is built using the latest web technologies (HTML5, CSS3, AJAX), providing a highly intuitive interface with easy to use panning and zooming controls for navigation as well as the ability to select and organize tracks to be

pre-generated using parsers packaged within the JBrowse source code repository that operate on input FASTA/GFF3/BAM files, and hosted on the web-server.

Apart from representing the genome annotation, our JBrowse instance exposes tracks of transposable elements, evidence (gene predictions, EST, RNA-seq and protein alignments), orthologs from other related species (computed as reciprocal best BLAST hits), synteny with related legume genomes (Medicago, chickpea, Lotus) and RNA-seq read coverage per tissue type (Table 1).

Track data can be easily downloaded in commonly utilized biological data formats such as GFF3, BED (representing genomic intervals) and FASTA (representing sequences), either pertaining to a specific sequence feature or the underlying genome sequence region (making use of the “Reference Sequence” track). Seamless sharing of the currently visible region with collaborators is facilitated by the encoding of all information about the view into the URL, which updates itself dynamically as the user pans to move around the genome. Thus at any given point, the visible URL (accessibly either via the browser address bar or the “Share” button) when shared with a colleague, located anywhere in the world, will display the exact same region of the genome and collection of open tracks, as seen on the user’s screen.

This genome browser is also compatible with users’ personal datasets, which can easily be loaded into a temporary track directly from a local computer for secure instantaneous viewing (without having to transmit data to any remote server). This gives end users the power to stream very large datasets such

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

visualized via drag-and-drop functionality (Figure 2A). Track data is served using JSON formatted files,

as read alignments, normally stored in the SAM/BAM format (Li et al. 2009) (Figure 2B) and also privacy to visualize their own datasets alongside the reference Medicago genome annotation.

Leveraging community expertise to help curate the genome

Since Medicago has been adopted as the model system for legume genetic and genome analysis of

have access to a large collection of mutants and ecotypes. Increasing numbers of genes characterized by cloning and those identified by mutation and ecotypic variation necessitates an organized approach to mutant symbol nomenclature, rules for which were developed (VandenBosch and Frugoli 2001) and curated by Julia Frugoli. These data have now been integrated with our resource and tie into the community annotation portal.

The community annotation portal is a tailored implementation of EuCAP (Thibaud-Nissen et al. 2007) that was initially developed as part of the Rice Genome Annotation initiative. It has been revamped visually and functionally by developing a new module to support the curation of existing and novel mutant gene symbols (with the encoded nomenclature rules). The underlying database tracks all modifications on a per-user basis, allowing an in-house curator to review and accept changes as and when necessary.

Registered users can suggest updates to gene functional annotation and metadata such as publications associated with the gene, mutant-to-gene symbol associations, etc. Researchers from the community who have had prior experience characterizing specific gene families are enrolled to work on functionally annotating the family members (Figure 3). Users can also suggest modifications to gene structure by submitting published FL-cDNA sequences that are mapped to the reference (by an in-house curator) using a spliced aligner to reveal the proposed gene structure, the results of which are loaded into the database for visualization by the client. We intend to replace this module with an implementation of WebApollo, which gives users the capability to perform a live sequence search against the reference and use the resulting spliced alignment to suggest gene structure modifications.

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

development during microbial interactions and several other aspects of legume biology, researchers now

The design of the community annotation portal follows the Model-View-Controller (MVC) paradigm, where (a) the Model corresponds to the MySQL database used to track input from the community using a custom schema abstracted using the Perl module Class::DBI; (b) the View is controlled by Perl/CGI code used to render content on the pages, making use of the Template toolkit for content management and the Bootstrap theme for styling; and (c) the Controller handles the business logic of updating the Model based

Mining Medicago related literature with Textpresso

Literature plays a major role in scientific communication among researchers, fostering exchanges of findings and sharing of detailed experimental methods. Given the rate of growth of published scientific literature, there is a need for computational methods to parse the literature and provide researchers with relevant information. Textpresso offers access to full text, so that the entire content of articles can be searched, and exposes flexible search capabilities allowing users to query the literature using categories of biological concepts/classes that relate two objects (such as association, regulation, etc.) or identify one (such as gene, allele, etc.). Our current implementation of Textpresso allows users to query through ~3700 full-text papers relevant to Medicago truncatula.

The Textpresso database is built by querying NCBI PubMed and retrieving a list of PMIDs associated with the query “Medicago truncatula”. Using these PMIDs, Perl scripts (part of the Textpresso package) are used to query and retrieve all available pieces of information pertaining to any given PMID: title, abstract, keywords, introduction, body, discussion, results, and references. These individual pieces of data are then cataloged and stored in a hierarchical manner on the file system and made query-able by the keywords field in the search interface, which is built using Perl/CGI.

Facilitating cross-species integration using InterMine

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

on input from the user and sending responses back to the View to update the user.

MedicMine, our implementation of the open source data warehouse application, InterMine (Smith et al. 2012), hosts Medicago genome and annotation data along with other datasets such as proteins from UniProt, gene function based on GO terms, gene and isoform level expression from tissue-specific RNAseq data, orthologs from other plant species and an electronic fluorescent pictograph (eFP) (Winter et al. 2007) for visual representation of gene expression level in different tissues (Benedito et al. 2008) (Figure 4; Table 1). InterMine, with its powerful query interface, enables users to build and execute complex

interesting genes for downstream analyses, and create and save commonly executed queries as “Templates”. InterMine also seamlessly runs queries across remote mine instances to locate homologs in other species. Based on homology information, MedicMine offers users the capability to link to Arabidopsis thaliana data hosted at ThaleMine (maintained by the Arabidopsis Information Portal project, https://www.araport.org), and a wide variety of plant genomes hosted at PhytoMine (maintained by

Phytozome at JGI, http://phytozome.jgi.doe.gov/).

Access to bulk data downloads

Our website also offers users the capability to download and consume the data in the form of flat-files. We maintain commonly used and requested datasets such as Genome FASTA, GFF3 containing the annotated gene models, Transposable Elements and predicted tRNAs, Gene/CDS/Protein FASTA, upstream/downstream (of the transcription or translation start sites) sequences, pertaining to the most recent genome release as well as previous versions. All data are hosted and accessible publicly via the iDS infrastructure.

Table 1: Summary of data sets used to populate Tripal/Chado, JBrowse, EuCAP, Textpresso and InterMine Data Set Genome Assembly & Genome Annotation Affy Probe Mapping Electronic Fluorescent Pictograph Proteins

Data Source JCVI Plant Genomics

Data Source URL http://www.medicagogenome.org

Noble Foundation Bio-Analytic Resource (BAR)

http://www.noble.org/plantbio/zhao/ http://bar.utoronto.ca/efpmedicago/cgibin/efpWeb.cgi http://www.uniprot.org

UniProt

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

searches using the “Query Builder” by joining across multiple datasets, build and store “Lists” of

Protein Domains GO Annotation Mutant Loci Orthologs RNA-seq Pathways Publications Synteny Blocks

InterPro InterPro2GO & PhyloFun Julia Frugoli, Clemson

http://www.ncbi.nlm.nih.gov/sra http://www.genome.jp/kegg/pathway.html http://www.ncbi.nlm.nih.gov/pubmed https://www.comparative-legumes.org

Training and Outreach efforts

Our web site maintains a help desk for processing requests received from the community. Requests consist mainly of questions about web-based resources, researchers asking for specialized datasets or seeking help with any of the tools made available through our portal. Helpdesk activity is managed using a JIRA-based tracking system, which is set up to automatically create tickets based on emails sent to [email protected].

As the community outreach component of the NSF grant, the project ran annual informatics training workshops. All members of the team at JCVI were involved in organizing and conducting these Plant Bioinformatics Workshops, open to researchers (college students, post-docs, professors and industry researchers) from all over the USA and the world. The workshop study material covered important concepts such as Unix for bioinformatics, genome assembly & annotation methods, RNA-seq data analysis, SNP calling, and comparative genomics. All materials were presented in the form of didactic sessions interspersed with hands-on learning modules. Remote attendee presence was managed via WebEx conferencing and their compute needs managed via pre-provisioned nodes on the Amazon Elastic Cloud Compute (EC2) platform. All the workshop exercises are being made available as a virtual machine image accessible via the iPlant Atmosphere cloud-computing infrastructure. Users can log in to the Atmosphere console with iPlant credentials, search for the “JCVI Plant Bioinformatics Workshop”

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

Reciprocal BLAST against Ath, Lja, Gma proteomes Sequence Read Archive (SRA) KEGG PubMed Legume Information System (LIS)

http://www.ebi.ac.uk/interpro/ http://www.ebi.ac.uk/GOA/InterPro2GO https://github.com/groupschoof/PhyloFun http://www.genome.clemson.edu/affiliated_cugi /medicago/genelist.html http://blast.ncbi.nlm.nih.gov/Blast.cgi

image and spawn an instance. Links to the presentations and the hands-on material will made be available in the description of the Atmosphere image listing.

Conclusions

The Medicago truncatula Genome Database (MTGD) hosts a wide variety of genomic datasets. Building

aims to be the hub, providing end-users with the most up to date and unrestricted access to the genome data by exposing intuitive capabilities to search, visualize, download, mine literature and perform crossspecies queries. Integrating all these resources into one portal and providing link outs to other useful legume and plant-specific resources, will facilitate and greatly aid the ever growing and active, worldwide, legume research community.

Funding This work was supported by the National Science Foundation (NSF) Division of Integrative Organismal Systems award [IOS-0821966].

Disclosures Conflicts of interest: No conflicts of interest declared.

Acknowledgments The authors would like to thank Julia Frugoli for developing the Medicago mutant nomenclature rules and sharing the data with us. We would also like to thank members of the Medicago research community for assisting with the curation of gene families and providing valuable feedback during the course of this project. Finally, we would like to acknowledge the JCVI Information Technology department for providing the infrastructure support.

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

upon the significant effort that went into improving the genome assembly and annotation, this web site

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. Journal of molecular biology 215: 403-410. Benedito, V.A., Torres-Jerez, I., Murray, J.D., Andriankaja, A., Allen, S., Kakar, K., et al. (2008) A gene expression atlas of the model legume Medicago truncatula. The Plant journal : for cell and molecular biology 55: 504-513.

Cook, D.R. (1999) Medicago truncatula--a model in the making! Current opinion in plant biology 2: 301-304. Cook, D.R., VandenBosch, K., de Bruijn, F.J. and Huguet, T. (1997) Model Legumes Get the Nod. Plant Cell 9: 275-280. Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., et al. (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS one 6: e19379. Engelhardt, B.E., Jordan, M.I., Muratore, K.E. and Brenner, S.E. (2005) Protein molecular function prediction by Bayesian phylogenomics. PLoS computational biology 1: e45. Goff, S.A., Vaughn, M., McKay, S., Lyons, E., Stapleton, A.E., Gessler, D., et al. (2011) The iPlant Collaborative: Cyberinfrastructure for Plant Biology. Front Plant Sci 2. Haas, B.J., Salzberg, S.L., Zhu, W., Pertea, M., Allen, J.E., Orvis, J., et al. (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology 9: R7. Hallab, A. and Schoof, H. (2012) PhyloFun. p. Function prediction of uncharacterized proteins using their evolutionary history. GitHub, https://github.com/groupschoof/PhyloFun. Hunter, S., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., et al. (2009) InterPro: the integrative protein signature database. Nucleic acids research 37: D211-215. Kanehisa, M. and Goto, S. (1999) KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28: 27-30. Lee, E., Helt, G.A., Reese, J.T., Munoz-Torres, M.C., Childers, C.P., Buels, R.M., et al. (2013) Web Apollo: a web-based genomic annotation editing platform. In Genome biology. BioMed Central Ltd. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078-2079.

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

Cantarel, B.L., Korf, I., Robb, S.M., Parra, G., Ross, E., Moore, B., et al. (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome research 18: 188-196.

Libault, M., Farmer, A., Joshi, T., Takahashi, K., Langley, R.J., Franklin, L.D., et al. (2010) An integrated transcriptome atlas of the crop model Glycine max, and its use in comparative analyses in plants. The Plant Journal 63: 86-99. Mochida, K. and Shinozaki, K. (2011) Advances in Omics and Bioinformatics Tools for Systems Analyses of Plant Functions. Plant and Cell Physiology 52: 2017-2038. Müller, H.-M., Kenny, E.E. and Sternberg, P.W. (2004) Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature. PLoS Biol 2: e309.

Sanderson, L.-A., Ficklin, S.P., Cheng, C.-H., Jung, S., Feltus, F.A., Bett, K.E., et al. (2013) Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases. Database 2013. Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E., Kato, T., Nakao, M., et al. (2008) Genome Structure of the Legume, Lotus japonicus. DNA Research 15: 227-239. Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., et al. (2010) Genome sequence of the palaeopolyploid soybean. Nature 463: 178-183. Skinner, M.E., Uzilov, A.V., Stein, L.D., Mungall, C.J. and Holmes, I.H. (2009) JBrowse: a nextgeneration genome browser. Genome research 19: 1630-1638. Smith, R.N., Aleksic, J., Butano, D., Carr, A., Contrino, S., Hu, F., et al. (2012) InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics 28: 3163-3165. Stein, L.D. (2013) Using GBrowse 2.0 to visualize and share next-generation sequence data. Briefings in bioinformatics 14: 162-171. Tadege, M., Wen, J., He, J., Tu, H., Kwak, Y., Eschstruth, A., et al. (2008) Large-scale insertional mutagenesis using the Tnt1 retrotransposon in the model legume Medicago truncatula. The Plant Journal 54: 335-347. Tang, H., Krishnakumar, V., Bidwell, S., Rosen, B., Chan, A., Zhou, S., et al. (2014) An improved genome release (version Mt4.0) for the model legume Medicago truncatula. BMC genomics 15: 312. Thibaud-Nissen, F., Campbell, M., Hamilton, J.P., Zhu, W. and Buell, C.R. (2007) EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome. BMC genomics 8: 388. VandenBosch, K.A. and Frugoli, J. (2001) Guidelines for Genetic Nomenclature and Community Governance for the Model Legume Medicago truncatula. Molecular Plant-Microbe Interactions 14: 1364-1367.

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

Perry, J.A., Wang, T.L., Welham, T.J., Gardner, S., Pike, J.M., Yoshida, S., et al. (2003) A TILLING Reverse Genetics Tool and a Web-Accessible Collection of Mutants of the Legume Lotus japonicus. Plant Physiology 131: 866-871.

Verdier, J., Torres-Jerez, I., Wang, M., Andriankaja, A., Allen, S.N., He, J., et al. (2013) Establishment of the Lotus japonicus Gene Expression Atlas (LjGEA) and its use to explore legume seed maturation. The Plant Journal 74: 351-362. Volkening, J.D., Bailey, D.J., Rose, C.M., Grimsrud, P.A., Howes-Podoll, M., Venkateshwaran, M., et al. (2012) A proteogenomic survey of the Medicago truncatula genome. Molecular & cellular proteomics : MCP 11: 933-944. Winter, D., Vinegar, B., Nahal, H., Ammar, R., Wilson, G.V. and Provart, N.J. (2007) An "Electronic Fluorescent Pictograph" Browser for Exploring and Analyzing Large-Scale Biological Data Sets. In PloS one.

Zhou, P., Silverstein, K.A., Gao, L., Walton, J.D., Nallu, S., Guhlin, J., et al. (2013) Detecting small plant peptides using SPADA (Small Peptide Alignment Discovery Application). BMC bioinformatics 14: 335. Zhou, S., Bechner, M.C., Place, M., Churas, C.P., Pape, L., Leong, S.A., et al. (2007) Validation of rice genome sequence by optical mapping. BMC genomics 8: 278.

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

Young, N.D., Debelle, F., Oldroyd, G.E., Geurts, R., Cannon, S.B., Udvardi, M.K., et al. (2011) The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480: 520-524.

Figure Legends

Figure 1: The Medicago truncatula Genome Database portal (A) Front page with links to the Tripal Search Interface (highlighted in red) and other important resources (MedicMine, EuCAP Community Annotation and Downloads) (B-F) Major elements of a Gene Report page featuring: (B) Annotation Overview (providing gene

Cross-references to external databases (NCBI Protein, Mt Gene Expression Atlas); (C) annotated sequence viewer (showing the exon/CDS within the transcript); (D) InterPro domain composition; (E) BLAST homolog results (against the NCBI nr database); and (F) the GO Annotation

Figure 2: Visualize the genome and supporting data using Jbrowse (A) Select tracks to view from the “Available Tracks” selector on the left. Drag mouse or use the convenient pan/zoom controls at the top to navigate around the browser to visualize a certain locus of interest; (B) Using the “File  Open” dialog box: (i) “Select Files” from your local file system or from a remote URL for viewing. Once the appropriate files have been chosen, (ii) click “Open” to visualize the data immediately. (iii) In this example, read alignments in the form of BAM file (with the supporting BAM index or bai file) are uploaded for viewing (iv).

Figure 3: Medicago EuCAP interface (A) Curate a gene family: Choosing to annotate a particular family member exposes a form requesting the proposed gene symbol, functional annotation, any literature associated with this gene and gene-to-mutant symbol association in available (B) Curate a mutant locus: Choosing to annotate a particular mutant exposes a form requesting background information about the characterized mutant, any associated literature and annotations of the mutant allele, if they exist.

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

description, confidence class rating and TAIR Curator Summary derived from best Arabidopsis homolog),

Figure 4: The MedicMine data query interface. (A) Summary information of a gene model featuring Identifier, Gene Description, Gene Structure (exon, intron, CDS) and links to remote Mines featuring Plant Genomics datasets. (B) View genome functional annotation data like GO terms, Proteins from UniProt and Homologs from other plant species; (C) Gene Expression data visualized in the form of Electronic Fluorescent Pictographs (eFP) images

estimated expression levels in FPKM based on RNA-seq data from different tissues.

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

loaded dynamically from the Bio-Analytic Resource webserver hosted at University of Toronto and

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

Figure 1: The Medicago truncatula Genome Database portal (A) Front page with links to the Tripal Search Interface (highlighted in red) and other important resources (MedicMine, EuCAP Community Annotation and Downloads) (B-F) Major elements of a Gene Report page featuring: (B) Annotation Overview (providing gene description, confidence class rating and TAIR Curator Summary derived from best Arabidopsis homolog), Crossreferences to external databases (NCBI Protein, Mt Gene Expression Atlas); (C) annotated sequence viewer (showing the exon/CDS within the transcript); (D) InterPro domain composition; (E) BLAST homolog results (against the NCBI nr database); and (F) the GO Annotation 506x506mm (154 x 154 DPI)

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

Figure 2: Visualize the genome and supporting data using Jbrowse (A) Select tracks to view from the “Available Tracks” selector on the left. Drag mouse or use the convenient pan/zoom controls at the top to navigate around the browser to visualize a certain locus of interest; (B) Using the “File à Open” dialog box: (i) “Select Files” from your local file system or from a remote URL for viewing. Once the appropriate files have been chosen, (ii) click “Open” to visualize the data immediately. (iii) In this example, read alignments in the form of BAM file (with the supporting BAM index or bai file) are uploaded for viewing (iv). 506x709mm (110 x 110 DPI)

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

Figure 3: Medicago EuCAP interface (A) Curate a gene family: Choosing to annotate a particular family member exposes a form requesting the proposed gene symbol, functional annotation, any literature associated with this gene and gene-to-mutant symbol association in available (B) Curate a mutant locus: Choosing to annotate a particular mutant exposes a form requesting background information about the characterized mutant, any associated literature and annotations of the mutant allele, if they exist. 456x304mm (171 x 171 DPI)

Downloaded from http://pcp.oxfordjournals.org/ at Mississippi State University Libraries on December 23, 2014

Figure 4: The MedicMine data query interface. (A) Summary information of a gene model featuring Identifier, Gene Description, Gene Structure (exon, intron, CDS) and links to remote Mines featuring Plant Genomics datasets. (B) View genome functional annotation data like GO terms, Proteins from UniProt and Homologs from other plant species; (C) Gene Expression data visualized in the form of Electronic Fluorescent Pictographs (eFP) images loaded dynamically from the Bio-Analytic Resource webserver hosted at University of Toronto and estimated expression levels in FPKM based on RNA-seq data from different tissues. 380x380mm (205 x 205 DPI)

MTGD: The Medicago truncatula genome database.

Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal inter...
1MB Sizes 1 Downloads 9 Views