Available online at www.sciencedirect.com

ScienceDirect Progress, challenges and the future of crop genomes Todd P Michael1 and Robert VanBuren2 The availability of plant reference genomes has ushered in a new era of crop genomics. More than 100 plant genomes have been sequenced since 2000, 63% of which are crop species. These genome sequences provide insight into architecture, evolution and novel aspects of crop genomes such as the retention of key agronomic traits after whole genome duplication events. Some crops have very large, polyploid, repeat-rich genomes, which require innovative strategies for sequencing, assembly and analysis. Even low quality reference genomes have the potential to improve crop germplasm through genome-wide molecular markers, which decrease expensive phenotyping and breeding cycles. The next stage of plant genomics will require draft genome refinement, building resources for crop wild relatives, resequencing broad diversity panels, and plant ENCODE projects to better understand the complexities of these highly diverse genomes. Addresses 1 Ibis Biosciences, Carlsbad, CA, United States 2 Donald Danforth Plant Science Center, St. Louis, MO, United States Corresponding author: VanBuren, Robert ([email protected])

Current Opinion in Plant Biology 2015, 24:71–81 This review comes from a themed issue on Genome studies and molecular genetics Edited by Insuk Lee and Todd C Mockler

http://dx.doi.org/10.1016/j.pbi.2015.02.002 1369-5266/# 2015 Elsevier Ltd. All rights reserved.

Introduction After the release of the Arabidopsis genome in 2000 [1] and the advent of Next Generation Sequencing (NGS) technology in 2005, the number of sequenced plant genomes has rapidly increased to more than 100 ([2], List of sequenced plant genomes; URL: https:// genomevolution.org/wiki/index.php/Sequenced_plant_ genomes). Nearly two-thirds (63%) of the sequenced plant genomes are from crops, while model, non-model and crop wild relatives make up the remainder; threefourths (76%) of the sequenced plant genomes are from eudicots and one-fifth (19%) are from monocots. Few genomes from non-flowering plants have been published thus far, with only three from the Gymnospermae, one from the Bryophyta and one from the Lycopodiophyta (Figure 1, Table 1). www.sciencedirect.com

The high throughput and low cost of NGS technologies made it possible to sequence crops with lower economic value or large genomes and have paved the way for establishing new model species. The complexity and size of some crop genomes made traditional Sanger sequencing cost prohibitive. The wheat genome for instance, is hexaploid, 90% repetitive, and 17 gigabases (Gb), and the sugarcane genome ranges in ploidy up to decaploid, and its 12 Gb is 80% repetitive. Although sequencing capacity and computational power are increasing exponentially, numerous challenges still remain, and both novel methodologies and legacy techniques are important to crack these impossible genomes. Model plant genomes such as Arabidopsis [1], Brachypodium distachyon [3], Physcomitrella patens (moss [4]) and Setaria italica [5,6], serve as an engine for research, while others like Oyrza sativa (rice [7,8]), Populus trichocarpa ([9] poplar), Zea mays (maize [10]), Glycine max (soybean [11]), Solanum lycopersicum (tomato [12]), and Pinus taeda (loblolly pine [13]) serve a dual purpose not just as crops but as functional models. Together these genomes have provided the foundation for an era of molecular genomics research that has enabled functional definition of many key genes and pathways. Non-model and non-crop plant genomes provide important clues to plant genome architecture and the evolution of flowering plants. Although it was thought that plants have a ‘one-way ticket to genome obesity’ as a result of the retention of proliferating transposable elements (TEs) [14], the smallest plant genomes [15], Utricularia gibba (bladderwort) and Genlisea aurea (corkscrew), provided evidence that almost all intragenic space and repeat sequence can be purged [16,17]. In addition, the aquatic, highly morphologically reduced, non-grass monocot Spirodela polyrhiza (greater duckweed), has a genome similar in size to Arabidopsis yet functions with 28% less genes (19,623) [18]. The genomes of Selaginella moellendorffii (spikemoss [19]) and Amborella trichopoda [20], provide the evolutionary link between vascular plants and angiosperms respectively, yielding key insights into the trajectory of plant specific gene families and the radiance of flowering plants. In this review we focus primarily on the most recently sequenced specialty and row crop genomes with an emphasis on challenges and limitations of current genome sequencing techniques. This segues into downstream work aimed at linking the genome to the biology, and concludes with the future of plant genomics. Current Opinion in Plant Biology 2015, 24:71–81

72 Genome studies and molecular genetics

Figure 1

Kiwi Blueberry Coffee Eggplant Tomato Potato Asterids

Pepper Utricularia Monkey Flower Sugar Beet Grape Soybean Common Bean Pigeon Pea Apple Pear

Rosids I

Strawberry Rosids

Sanger only Sanger + 454/Illumina 454 + Illumina Illumina only

Core eudicots

Medicago

Sequencing Technology

Peach Watermelon Cucumber

Whole Genome Duplication

Poplar

Whole Genome Triplication

Willow Cassava

Polyploid crop species

Basal Eudicots

Rosids II

Rubber Jatropha Castor Bean Eucalyptus Orange Cotton Cocao Papaya Arabidopsis thaliana Arabidopsis lyrata Camelina Brassica rapa Brassica oleracea Brassica napus Sacred lotus Wheat Barley

Flowering Plants

Seed Plants

Vascular Plants

Land Plants

Monocots

Brachypodium Rice Bamboo Tef Setaria Maize Sorghum Banana Oil Palm Date Palm Duckweed Amborella Loblolly Pine Norway Spruce Selaginella Physcomitrella Chlamydomonas Vovlox Current Opinion in Plant Biology

Current Opinion in Plant Biology 2015, 24:71–81

www.sciencedirect.com

Progress, challenges and the future of crop genomes Michael and VanBuren 73

Major challenges in crop genome sequencing projects Genome assembly tools, which were generally designed and tested for non-plant species [21], are ill suited for handling the issues of genome size, repeat content, paralogy, and heterozygosity that are common in plant genomes. The throughput of NGS technologies has made it economical to sequence most crop genomes, but resolving plant genome complexity with 100–200 bp reads is still a challenge. Most recent mammalian genomes are assembled into chromosome scale regions [22], but most draft plant genomes remain in thousands of highly fragmented contigs or hundreds of scaffolds with numerous imbedded gaps. Even the Arabidopsis genome, which is arguably the best-assembled plant genome, is still in 102 contigs with a total gap length of at least 185,644 bp (TAIR 10 [23]). Genome size and repeat content, which are often highly correlated, present a major problem for plant genome assembly. Genome size in plants varies by 4 orders of magnitude, from 61 Mb (Genlisea tuberosa) to over 150,000 Mb (Paris japonica) (reviewed in [24]). NGS platforms can now generate enough raw data to sequence large genomes but assembling so much data is a major computational problem. The loblolly pine genome is the largest genome assembled to date (22 Gb) and used a preprocessed, condensed set of ‘super-reads’ to reduce the computation resources needed for assembly [13]. Repeats are a major problem in genome assembly, and resolving repeat structures requires sequencing read lengths that exceed the 10–20 kb repeats commonly found in plant genomes. Type II ‘cut and paste’ long terminal repeat (LTR) retrotransposons are the most prevalent repeat in plant genome and their proliferation results in genome bloating. Estimating the average repeat lengths in the genome is crucial for picking read lengths, sequencing technology, mate pair libraries sizes and coverage. Much of the structural variation (SV) between cultivars within plant genomes is due to the movement of LTRs, and reference based re-sequencing projects often miss or inaccurately predict SVs. De novo assembly of three divergent rice strains uncovered several Megabases (Mb) of novel sequences in each strain, with many contigs containing expressed genes [25]. Issues of paralogy complicate genome assembly and result in incomplete, highly fragmented assemblies. Polyploidy is common among crop species and plants have large, multi-gene families with highly similar paralogs.

Outcrossing species like grape, clonally propagated crops like apple, and long-lived trees like Eucalyptus tend to have high levels of within genome heterozygosity. Paralogous regions and heterozygous sites create ‘bubbles’ during genome assembly where two or more regions that are highly similar assemble together, and the adjacent dissimilar regions assemble separately but eventually merge again (Figure 2a). Assembly issues stemming from polyploidy or heterozygosity can be overcome by using diploid progenitor species (‘robusta’ coffee [26] and wheat [27]), closely related wild diploid species (woodland strawberry [28]), haploid/monoploid lines (citrus [29,30], banana [31] or peach [32]), or a bacterial artificial chromosome (BAC) by BAC sequencing approach (maize [10] or pear [33]) (Figure 2). Organelle DNA contamination can be a major problem in genome sequencing projects. Plant cells can have over 100 chloroplasts with up to 10,000 plastid DNA copies per cell [34] and organelle derived reads can constitute 5– 20% of the total sequences in a whole genome sequencing (WGS) project. Modified DNA extraction protocols optimized for nuclei isolation are typically used, which can reduce organelle contamination several fold [35]. Organelle contamination can be tested before library construction using a simple qPCR protocol [35]. Plant nuclear genomes also contain numerous organelle derived sequences which can have near identical homology to the organelle genomes themselves, accurately sequencing through these regions requires read lengths that can span the insertion junction sites. Overcoming the challenges of sequencing plant genomes requires both advances in sequencing technology. Longer read lengths provided by third generation single molecule sequencers like Pacific Biosciences (PacBio) offer the possibility of overcoming the complexities of plant genomes. The average read length of PacBio reads using P6C4 chemistry is over 15 kb with an average coverage of 30 required for a quality of Q50. Circularized consensus sequencing (CCS) from PacBio uses short sequences (1–5 kb) that are circularized and sequenced multiple times to produce very high quality sequences which can be used to distinguish highly similar repeat or paralogous structures. CCS is also useful for building high confidence, allele specific, full length transcripts with well annotated alternative splicing sites [36]. Given high enough read depth, all but the longest repeats (such as telomeres and centromeres) can be resolved, potentially ushering a new era of platinum quality genome assemblies. A preliminary de novo assembly of Arabidopsis

( Figure 1 Legend ) Distribution and characteristics of sequenced plant genomes. Phylogeny was adapted from: https://genomevolution.org/wiki/ index.php/Sequenced_plant_genomes and only representative crop, model and evolutionary significance genomes are shown. Branch colors represent sequencing technologies used: Sanger only (brown), Sanger + 454/Illumina (blue), 454 + Illumina (red) and Illumina only (gold). Green and blue circles represent whole genome duplications and triplications respectively. Branch length does not correlate with divergence www.sciencedirect.com

Current Opinion in Plant Biology 2015, 24:71–81

74 Genome studies and molecular genetics

Table 1 Published sequenced plant genomes. Over 100 plant genomes have been sequenced and published since 2000. The statistics for each genome are taken from the publication, despite several model plants having significant updates to genome assemblies and gene counts. NA, data not available in publication. Mb, megabases; kb, kilobases Common name Arabidopsis Rice Rice Rice Black Cottonwood Grape Moss Grape Papaya Lotus Sorghum Cucumber Corn Soybean Brachypodium Castor bean Apple Jatropha Cocoa Strawberry Lyrata Spikemoss Date palm Potato Thellungiella Cucumber Chinese cabbage Hemp Pigeon pea Medicago Setaria Setaria Tomato Melon Flax Banana Tobacco Cotton D Neem Barely Pear Dwarf birch Sweet orange Watermelon Wheat Cotton D Chinese plum Chickpea Rubber tree Moso bamboo Rice relative Eutrema salsugineum Peach Wheat DD

Scientific name

Year

Phyla

Gene (#)

Repeat (%)

Scaffold N50 (kb)

Arabidopsis thaliana Oryza sativa Oryza sativa Oryza sativa Populus trichocarpa

2000 2002 2002 2005 2006

Dicot Monocot Monocot Monocot Dicot

Model Crop Crop Crop Crop

125 430 420 403 485

25,498 59,855 29,961 37,544 45,555

14 26 NA 26 42

NA 12 NA NA 3100

11130711 11935017 11935018 16100779 16973872

Vitis vinifera Physcomitrella patens Vitis vinifera Carica papaya Lotus japonicus Sorghum bicolor Cucumis sativus Zea mays Glycine max Brachypodium distachyon Ricinus communis Malus  domestica Jatropha curcas Theobroma cacao Fragaria vesca Arabidopsis lyrata Selaginella moellendorffii Phoenix dactylifera Solanum tuberosum Thellungiella parvula Cucumis sativus Brassica rapa

2007 2008 2007 2008 2008 2009 2009 2009 2010 2010

Dicot Bryophyte Dicot Dicot Dicot Monocot Dicot Monocot Dicot Monocot

Crop Model Crop Crop Non-model Crop Crop Crop Crop Model

475 510 505 372 472 818 367 2300 1115 272

30,434 35,938 29,585 28,629 30,799 34,496 26,682 32,540 46,430 25,532

41 16 27 43 56 62 24 85 57 21

2065 1320 1330 1000 NA 62,400 1140 76 47,800 59,300

17721507 18079367 18094749 18432245 18511435 19189423 19881527 19965430 20075913 20148030

2010 2010 2010 2011 2011 2011 2011

Dicot Dicot Dicot Dicot Dicot Dicot Lyco

Crop Crop Crop Crop Crop Model Non-model

320 742 380 430 240 207 110

31,237 57,386 40,929 28,798 34,809 32,670 22,285

50 67 36 24 23 30 38

561 1542 NA 473 1300 24,500 1700

20729833 20802477 21149391 21186351 21186353 21478890 21551031

2011 2011 2011 2011 2011

Monocot Dicot Dicot Dicot Dicot

Crop Crop Model Crop Crop

658 844 140 367 485

28,890 39,031 30,419 26,587 41,174

40 62 8 NA 40

30 1318 5290 319 1971

21623354 21743474 21822265 21829493 21873998

Cannabis sativa Cajanus cajan Mediucago truncatula Setaria italica Setaria italica Solanum lycopersicum Cucumis melo Linum usitatissimum Musa acuminata malaccensis Nicotiana benthamiana Gossypium raimondii Azadirachta indica Hordeum vulgare Pyrus bretschneideri Betula nana Citrus sinensis Citrullus lanatus Triticum aestivum Gossypium raimondii Prunus mume Cicer arietinum Hevea brasiliensis Phyllostachys heterocycla Oryza brachyantha Eutrema salsugineum

2011 2012 2011 2012 2012 2012 2012 2012 2012

Dicot Dicot Dicot Monocot Monocot Dicot Dicot Dicot Monocot

Crop Crop Model Model Model Crop Crop Crop Crop

820 833 454 490 510 900 450 373 523

30,074 48,680 62,388 38,801 35,471 34,727 27,427 43,484 36,542

NA 52 31 46 40 63 NA 24 44

16 516 1270 1007 47,300 16,467 4680 132 1311

22014239 22057054 22089132 22580950 22580951 22660326 22753475 22757964 22801500

2012 2012 2012 2012 2013 2012 2013 2012 2012 2012 2012 2013 2013 2013

Dicot Dicot Dicot Monocot Dicot Dicot Dicot Dicot Monocot Dicot Dicot Dicot Dicot Monocot

Crop Crop Crop Crop Crop Non-model Crop Crop Crop Crop Crop Crop Crop Non-model

3000 880 364 5100 527 448 367 425 17,000 880 280 738 2150 2075

NA 40,976 20,169 30,400 42,812 NA 29,445 23,440 94,000 37,505 31,390 28,269 68,955 31,987

NA 60 13 84 53 NA 21 45 80 61 45 49 72 59

89 2284 452 NA 541 19 1690 2380 NA 18,800 578 39,990 3 329

22876960 22922876 22958331 23075845 23149293 23167599 23179022 23179023 23192148 23257886 23271652 23354103 23375136 23435089

2013 2013

Monocot Dicot

Wild-relative Non-model

300 243

32,038 26,351

29 51

1013 13,400

23481403 23518688

Prunus persica Aegilops tauschii

2013 2013

Dicot Monocot

Crop Crop

265 4360

27,852 43,150

37 66

27,400 58

23525075 23535592

Current Opinion in Plant Biology 2015, 24:71–81

Type

Size (Mb)

PMID

www.sciencedirect.com

Progress, challenges and the future of crop genomes Michael and VanBuren 75

Table 1 (Continued ) Common name Wheat AA Sacred lotus Bladderwort Norway spruce White spruce C. grandiflora Neslia paniculata Capsella Tobacco Tobacco Brassicaceae Brassicaceae Brassicaceae Genlisea Oil palm Mulberry Kiwifruit Poplar, wild Amborella Greater duckweed Pepper Peper, wild Lobloly pine Camelina Tobacco Tobacco Tobacco Tobacco Cotton A Brassica Wild radish Common bean Sweet orange Clementine Eucalyptus Willow Soybean, wild Tef Wheat Rice relative Tomato, wild Canola Coffee Soybean, wild Soybean, wild Soybean, wild Soybean, wild Soybean, wild Soybean, wild Soybean, wild Eggplant Cassava Cassava Jujube Blueberry

Scientific name

Year

Phyla

Type

Size (Mb)

Triticum urartu Nelumbo nucifera Utricularia gibba Picea abies Picea glauca C. grandiflora Neslia paniculata Capsella rubella Nicotiana sylvestris Nicotiana tomentosiformis Leavenworthia alabamica Sisymbrium irio Aethionema arabicum Genlisea aurea Elaeis guineensis Morus notabilis Actinidia chinensis Populus euphratica Amborella trichopoda Spirodela polyrhiza Capsicum annuum Capsicum annuum Pinus taeda Camelina sativa Nicotiana tabacum Nicotiana tabacum Nicotiana tabacum Nicotiana otophora Gossypium arboreum Brassica oleracea Raphanus raphanistrum Phaseolus vulgaris Citrus sinensis Citrus clementina Eucalyptus grandis Salix suchowensis Glycine max Egrostis tef Triticum aestivum Oryza glaberrima Solanum pennellii Brassica napus Cofea canefora Glycine soja Glycine Soja Glycine soja Glycine soja Glycine soja Glycine soja Glycine soja Solanum melongena Manihot esculenta Manihot esculenta Ziziphus jujuba Vaccinium corymbosum

2013 2013 2013 2013 2013 2013 2013 2013 2013 2013

Monocot Dicot Dicot Gymnosperm Gymnosperm Dicot Dicot Dicot Dicot Dicot

Crop Non-model Non-model Crop Crop Non-model Non-model Non-model Crop Crop

2013

Dicot

2013 2013 2013 2013 2013 2013 2013 2013 2013 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014

www.sciencedirect.com

Gene (#)

Repeat (%)

Scaffold N50 (kb)

4940 929 77 19,600 20,000 200 NA 219 2636 2360

34,879 26,685 28,500 28,354 NA NA NA 26,521 38,940 38,648

67 57 3 70 NA NA NA NA 72 75

64 3400 95 NA 20 98 62 15,100 80 83

23535596 23663246 23665961 23698360 23698863 23749190 23749190 23749190 23773524 23773524

Non-model

316

30,343

27

70

23817568

Dicot Dicot Dicot Monocot Dicot Dicot Dicot Dicot Monocot Dicot Dicot Gymnosperm Dicot Dicot Dicot Dicot Dicot Dicot Dicot Dicot

Non-model Non-model Non-model Crop Non-model Crop Wild-relative Non-model Crop Crop Wild-relative Crop Crop Crop Crop Crop Crop Crop Crop Crop

262 240 64 1824 357 758 593 748 158 3260 3070 23,200 785 4600 4410 4570 2700 1724 630 515

28,917 23,167 17,755 34,802 29,338 39,040 34,279 27,313 19,623 35,336 34,476 50,172 89,418 91,870 81,404 93,650 NA 41,330 45,758 38,174

38 37 NA 18 47 36 44 57 13 80 81 82 28 73 79 73 NA 69 38 NA

135 118 6 1045 390 646 482 4900 4924 1226 445 67 2160 345 385 350 27 666 1457 10

23817568 23817568 23855885 23883927 24048436 24136039 24256998 24357323 24548928 24591624 24591625 24647006 24759634 24807620 24807620 24807620 24807620 24836287 24852848 24876251

Dicot Dicot Dicot Dicot Dicot Dicot Monocot Monocot Monocot Dicot Dicot Dicot Dicot Dicot Dicot Dicot Dicot Dicot Dicot Dicot Dicot Dicot Dicot Dicot

Crop Crop Crop Crop Non-model Wild-relative Crop Crop Wild-relative Wild-relative Crop Crop Wild-relative Wild-relative Wild-relative Wild-relative Wild-relative Wild-relative Wild-relative Crop Wild-relative Wild-relative Crop Crop

587 367 367 640 425 1165 772 17,000 316 1200 1130 710 981 1001 1054 1118 956 993 889 1126 742 742 444 600

27,197 25,379 24,533 36,376 26,599 52,395 38,000 124,201 33,164 32,273 101,040 25,574 55,061 54,256 56,542 57,631 55,901 54,805 54,797 85,446 34,483 38,845 32,808 60,000

45 31 45 50 40 43 14 NA 34 82 35 50 NA NA NA NA NA NA NA 70 37 26 47 NA

50,000 250 6800 53,900 924 401 66 NA 217 1741 763 1260 18 57 17 49 65 52 45 64 67 27 301 145

24908249 24908277 24908277 24919147 24980958 25004933 25007843 25035500 25064006 25064008 25146293 25190796 25218520 25218520 25218520 25218520 25218520 25218520 25218520 25233906 25300236 25300236 25350882 NA

PMID

Current Opinion in Plant Biology 2015, 24:71–81

76 Genome studies and molecular genetics

Figure 2

whole genome shotgun reads

double haploid or diploid species

(b)

Highly heterozygotic or polyploid species

(a)

resources until recently because of the high cost of early Sanger sequencing and their low production values compared to row crops. Papaya is an exception to this, as it was the fifth plant genome and first specialty crop to be sequenced with roughly 3X coverage of Sanger reads and BAC end sequences for scaffolding. The papaya drafted genome showed the stability of transgenic insertions and served as a powerful model for early comparative genomics work because it lacks a lineage specific whole genome duplication (WGD) [37]. The introduction of NGS technologies facilitated sequencing several small specialty crop genomes including cucumber [38], apple [39], strawberry [28], cacao [40], date palm [41], and watermelon [42]. Most of the recently published crop genomes are specialty crops (see Table 1). Here we focus on specialty crops with the most economic value, and row crops with novel sequencing strategies that can be applied to other species.

A

B

A

A′

(d)

A

B

Minimum tilling path of BACs

Use short readsPacBio or WGS reads for error correction

long third generation PacBio reads

(c)

A

B

Current Opinion in Plant Biology

Strategies for sequencing complex crop genomes. A and B represent subgenomes in a polyploidy or homologous chromosomes in a highly heterozygotic species, grey lines connect nearly identical regions and light blue and red represent diverse regions. A typical WGS strategy yields a highly fragmented genome as similar regions assemble together and diverse regions assemble separately creating ‘bubbles’ in the assembly graph. (b) A double haploid (for heterozygous species) or diploid relative/progenitor (for polyploidy species) can be used to simplify the assembly. Reads from only one haplotype (in this case A) assemble without ambiguities. (c) Long reads from third generation single molecule sequencers like PacBio can be used to assemble both haplotypes separately resulting in two complete subgenomes. Short PacBio or Illumina WGS reads are used to correct the long reads prior to assembly. (d) A more traditional and expensive BAC by BAC approach can be used where a BACs from a minimum tilling path are sequenced separately and the stitched together to create a chimeric assembly of both haplotypes. WGS can be mapped to the chimera to sequence the second haplotype.

thaliana accession Ler-0 using P4C3 chemistry produced a contig N50 of 6.36 Mb, similar to the quality of the TAIR10 release (PacBio website; URL: http://blog. pacificbiosciences.com/2013/08/new-data-releasearabidopsis-assembly.html).

NGS technologies kick-start specialty crops genomics Specialty crops, which include most fruits and vegetables, nut trees, and beverage crops, have had limited genomic Current Opinion in Plant Biology 2015, 24:71–81

Tomato seed is worth its weight in gold. It is the leading vegetable crop with a rich and diverse breeding program, and serves as a model system for fruit development. The high quality sanger based genome uncovered a whole genome triplication event which facilitated neo-functionalization of genes related to fruit quality and development [43]. Comprehensive resequencing of 360 diverse tomato accessions showed that two independent sets of quantitative trait loci (QTLs) are responsible for the 100fold increase in fruit size during tomato domestication [44]. In addition to tomato and potato, other members of the Solanaceae have also been recently sequenced, including a domesticated pepper, a wild-relative and 20 resequenced pepper accessions [45] and the eggplant genome [46]. The citrus complex, which is an admixture of hybrids that includes oranges, grapefruit, lemons and limes, is the highest value fruit crop across the world. Reference citrus genomes for sweet orange [30] and Clementine mandarin [29] were sequenced using double-haploid lines to eliminate within genome heterozygosity because of clonal propagation and interspecific hybridization. Resequencing efforts showed cultivated citrus are derived from two progenitor species and sweet orange has a complex pedigree; one parent is a shared ancestor of mandarin and the other is likely a pummelo with introgressions of wild mandarin [29,30]. Banana is a key starch staple in Africa and Asia with consumption up to 400 kg/person/year, and is the second most cultivated tropical fruit behind the citrus complex. Banana was the first non-grass monocot with a published high contiguity reference genome. Most global banana production stems from somaclones derived from a single triploid line ‘Cavendish’, and genomic resources are essential for improved disease resistance and yield. A double-haploid banana was used to overcome issues www.sciencedirect.com

Progress, challenges and the future of crop genomes Michael and VanBuren 77

associated with triploidy [31]. Banana has three WGD events independent of the grasses, but surprisingly few NBS-LRR disease resistance genes (89), which may contribute to its disease susceptibility [31]. Over 2.25 billion cups of coffee are consumed each day making coffee the world’s leading beverage crop. Most coffee comes from Coffea arabica, a highly heterozygous, outcrossing allotetraploid. The coffee reference genome was generated using a double haploid accession of C. canephora, one of the diploid parents of C. arabica and the source of ‘robusta’ coffee [26]. Coffee has tandem duplications of N-methyltransferases (NMTs) that contribute to caffeine production. Comparisons of NMTs from tea and cacao suggest caffeine biosynthesis has polyphyletic origins and evolved at least twice.

Sub-genome assisted sequencing of complex polyploidy row crops 42% of human energy supply comes from cereal row crops such as rice, wheat, and maize [47], while other row crops such as cotton, soybean and canola play major roles in nutrition and clothing. The first row crop genomes, rice, maize and sorghum were sequenced using Sanger WGS and BAC by BAC approaches [7,8,10,48]. Other cereals have been sequenced recently including barley [49] the model C3 grass B. distachyon [3] and the model C4 grass S. italica [5]. Row crops have a propensity for polyploidy, which likely contributes to their improved nutritional content and high yields. Polyploidy also confers emergent properties like seed oil accumulation in canola, spinnable fibers in cotton, and grain composition in wheat. The presence of multiple subgenomes complicates genome assembly of polyploids, but a powerful approach of sequencing the likely subgenome diploid contributor has accelerated our understanding of these complex genomes. Brassica napus (canola or rapeseed), the third largest source of vegetable oil, is an allotetraploid of B. rapa (turnips and nana cabbage) and B. oleracea (cabbage, broccoli, cauliflower, kale and other cruciferous vegetables), which occurred 7500–12,500 years ago. Most of the 20,000 Illumina and 454 based scaffolds in B. napus were assigned to either the A or C subgenome using 454 reads from each progenitor parent, providing for unprecedented comparisons of homeologous regions in a polyploidy species. Despite the young age of rapeseed, around 100 genes have been lost from each subgenome and subtle changes in epigenetic regulation, homeologous exchange and gene expression divergence have occurred. Repeated whole genome duplications in rapeseed have created a 72 fold duplication of the ancestral flowering plant genome, and unique expansions in oil biosynthesis genes and loss of glucosinolate genes were observed [50]. Bread wheat (Triticum aestivum) is a staple food for 30% of the world’s population but until recently, genomic www.sciencedirect.com

resources were limited because of its large (17 Gb), hexaploid (2n = 6x = 42), repeat rich (87%) genome [51]. To overcome issues with genome complexity, flow cytometry was used with aneuploidy wheat lines to isolate and sequence each chromosome arm separately, with each arm representing 1.3–3.3% of the genome. The final assembly is highly fragmented spanning 10.2 Gb with 124,201 genes distributed unevenly across the three subgenomes. The subgenomes of wheat have limited gene loss or rearrangements, contrasting the dynamic shuffling and loss in the much younger polyploidy events in B. napus. This suggests plasticity in the events post WGD; all polyploidy events are not alike. There is no global genome dominance among the wheat subgenomes, but there is cell and stage dependent dominance, including gene families related to baking quality [52]. Tetraploid cotton (AADD, Gossypium hirsutum) has higher fiber and quality production than diploid cotton (G. barbadense), an emergent property with major QTLs in the D subgenome from G. raimondii, which has no spinnable fibers [53]. Diploid cotton (G. raimindii) has an abrupt 5–6 fold ploidy increase after splitting from the cacao linage, rivaled only by the Brassicaceae. Tetraploid cotton has numerous non-reciprocal DNA exchanges between the A and D subgenomes, and coordinated gene expression changes including nuclear mitochondrial genes involved in electron transport, which likely contribute to fiber production [54].

Tree crop genomes Trees are long-lived perennials that are valued for timber, fuel and other products. Extraordinary progress has been made on economically important tree crops starting with the publication of P. trichocarpa (poplar [9]), in 2006, which made it the third published plant genome. Another fast growing and economically import tree Eucalyptus grandis was published more recently [55]. The Eucalyptus genome revealed an expansion of terpene synthesis genes associated with defense as well as the largest number of tandem repeats of any sequenced genome to date. Three gymnosperm genomes, Picea tadea (loblolly pine [13]), Picea glauca (white spruce [56]), and Picea abies (norway spruce [57]), also have been sequenced recently. In addition to being the first three gymnosperm genomes, they are also the largest genomes sequenced to date. Since the generation time in most trees is long compared to row crops, draft genomes enables technologies like genomic selection (GS), which model superior genotypes using genome-wide markers and limited phenotyping, reducing time-consuming phenotypic selection and breeding cycles [58].

Orphan crops Orphan crops have limited improvement from their wild relatives, unrecognized nutritional value, disease susceptibility, poor shelf life and growth constraints. However, Current Opinion in Plant Biology 2015, 24:71–81

78 Genome studies and molecular genetics

orphan crops like pigeon pea (Cajanus cajan), cassava and tef (Eragrostis tef) are major staples in underdeveloped regions and genomic resources are essential to boost production. The draft genome of pigeon pea provided over 300,000 SSR markers for plant breeding and screening of

the 13,632 accessions maintained in the ICRISAT genbank [59]. Pigeon pea has a large repertoire of universal drought response proteins, which may contribute to its drought tolerance. Draft genomes for both wild and cultivated cassava are available as well as a repertoire of SNPs

Figure 3

(a) 100% Soil Capacity

75% Soil Capacity

50% Soil Capacity

25% Soil Capacity

No Water

Abiotic Stress

Photosynthesis Measurements Electrolyte Leakage Assays

(b)

(c)

ENCODE Analyses

Nucleosome Hypersensitivity Sites

TFs

CH3

CH3 Gene

CH3

(d)

Long-range regulatory elements

Cis-regulatory elements (Promoters TF binding sites)

Exon Transcript

H3K4me3 ChIP-seq (Methylation)

ChIP-seq (Protein/Histone Binding)

DNase-seq (Hypersenstitive Sites)

mRNAseq (transcript Abundance)

DNAseq (Resequencing) SNP

Deletion Current Opinion in Plant Biology

The next frontier of plant genomics: Overview of the Brachypodium plant ENCODE project. (a) Plants (in this case Brachypodium), are subjected to various abiotic stresses and responses are measured using high-throughput phenotyping and physiology. Material from stressed and control plants are used to generate ENCODE datasets in panel d. (b) Gene co-regulation network produced from ENCODE data. (c) Genetic elements that are targeted in an ENCODE project. (d) Example datasets from an ENCODE project. Peaks reflect a high depth of Illumina reads corresponding to highly methylated regions (H3K4me3 ChIP-seq), regions with TF or histone binding (ChIP-seq), open chromatin regions (DNase-seq), transcribed regions (mRNAseq) and polymorphisms/InDels between different lines (DNAseq). Taken together, ENCODE datasets can paint a near complete picture of epigenetic regulation under a given stress. Current Opinion in Plant Biology 2015, 24:71–81

www.sciencedirect.com

Progress, challenges and the future of crop genomes Michael and VanBuren 79

for breeding. Cassava genes involved in starch accumulation, photosynthesis and abiotic stresses have been positively selected during domestication and genes involved in toxic cyanogenic glucoside formation have been purged [60]. A draft genome of tef identified novel SSR loci for marker assisted breeding and provides a framework for identifying genes related to abiotic stresses and nutrition [61].

phenotyping will be leveraged to understand the networks active under drought conditions (Figure 3, DOE award abstract; URL: http://genomicscience.energy.gov/ research/DOEUSDA/abstracts/2014mockler_abstract. shtml). Once we have detailed genome maps the ability to edit specific sequences such as has been done in rice and wheat [67] will enable the next generation of crop domestication.

Conclusion: beyond the assembled genome and the future of crop genomics

The amount of available genomic data for crop plants is staggering, and thousands of Gb of plant sequences are deposited in NCBI and other public databases monthly. As a community, we are about to have resources we could only dream of. How will we use them to meet the challenge of feeding 9 billion people by 2050? New tools for analyzing these high-throughput datasets are desperately needed, and training of young scientists should shift toward computer science and engineering in order to prosper in the changing face of biological research. How will we build the plant genomicists of the future who only know science with big data, whole genome analysis and full information access?

Though reference genomes are now available for many crops, only diploid/haploid references are available for polyploidy crops like potato, coffee, strawberry and banana and most highly heterozygotic genomes have only one sequenced haplotype. Some of the elements contributing to agronomic traits like gene duplications, genome rearrangements and repeat integrations are cultivar specific, and cannot be found using resequencing strategies or reduced complexity references. High quality references of each subgenome (in polyploids) and each haplotype (in heterozygous crops) as well as multiple references per crop species are needed to survey true variation. Many crop genomes, especially those of vegetables, have gone through a very tight breeding bottleneck to arrive at our table. The impact is that a great deal of diversity is lost in current breeding germplasm, leading to slowed improvement and potential for loss of disease resistance. There is a pressing need to develop genomic resources for these crop wild-relatives so that they can be used in breeding, allele identification and introgression [62]. Draft genomes from wild relatives of tomato (Solanum pennellii), soybean (Glycine soja) [63–65] and cassava [60] are currently available, but more wild species are needed for crop improvement programs. An assembled reference genome sequence is simply a foundation; the true challenge is to identify the features of the genome that describe the biology. Although every cell has essentially the same DNA sequence, epigenetic decorations and gene expression vary greatly by cell based on the environment, developmental stage and tissue type. The next phase of crop genomics will be to completely elucidate these biologically active states of DNA, as has been done for other model systems such as human, mouse, drosophila and C. elegans in a plant ENCODE (Encyclopedia of DNA Elements) projects [66]. Such studies can take on several forms, such as has been done for other model systems where integrated maps of DNA methylation, smallRNA, histone modification and transcript abundance are measure across multiple tissues and conditions (Figure 3). Brachypodium has the first formal plant ENCODE project funded by DOE/USDA where not only detailed molecular maps of epigenetic modifications and expression will be generated, but www.sciencedirect.com

Acknowledgements This work was supported in part by funding from the National Science Foundation (DBI-1401572) to R.V., and DARPA (HR0011-13-C-0103) to T.P.M.

References and recommended reading Papers of particular interest, published within the period of review, have been highlighted as:  of special interest  of outstanding interest 1.

Initiative AG: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000, 408:796.

2.

Michael TP, Jackson S: The first 50 plant genomes. Plant Genome 2013:6.

3.

Vogel JP, Garvin DF, Mockler TC, Schmutz J, Rokhsar D, Bevan MW, Barry K, Lucas S, Harmon-Smith M, Lail K: Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 2010, 463:763-768.

4.

Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud P-F, Lindquist EA, Kamisugi Y: The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 2008, 319:64-69.

5.

Bennetzen JL, Schmutz J, Wang H, Percifield R, Hawkins J, Pontaroli AC, Estep M, Feng L, Vaughn JN, Grimwood J: Reference genome sequence of the model plant Setaria. Nat Biotechnol 2012, 30:555-561.

6.

Zhang G, Liu X, Quan Z, Cheng S, Xu X, Pan S, Xie M, Zeng P, Yue Z, Wang W: Genome sequence of foxtail millet (Setaria italica) provides insights into grass evolution and biofuel potential. Nat Biotechnol 2012, 30:549-554.

7.

Goff SA, Ricke D, Lan T-H, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H: A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 2002, 296:92-100.

8.

Yu J, Hu S, Wang J, Wong GK-S, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X: A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 2002, 296:79-92.

9.

Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A: The Current Opinion in Plant Biology 2015, 24:71–81

80 Genome studies and molecular genetics

genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 2006, 313:1596-1604. 10. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA: The B73 maize genome: complexity, diversity, and dynamics. Science 2009, 326:11121115. 11. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J: Genome sequence of the palaeopolyploid soybean. Nature 2010, 463:178-183. 12. Consortium TG: The tomato genome sequence provides insights into fleshy fruit evolution. Nature 2012, 485:635-641. 13. Zimin A, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M, Marc¸ais G, Puiu D, Roberts M, Wegrzyn JL, de Jong PJ:  Sequencing and assembly of the 22-Gb loblolly pine genome. Genetics 2014, 196:875-890. The authors use a novel reduced complexity sequencing strategy to assemble the 22 Gb loblolly pine genome, the largest genome sequenced to date. 14. Bennetzen JL, Kellogg EA: Do plants have a one-way ticket to genomic obesity? Plant Cell 1997, 9:1509. 15. Fleischmann A, Michael TP, Rivadavia F, Sousa A, Wang W, Temsch EM, Greilhuber J, Mu¨ller KF, Heubl G: Evolution of genome size and chromosome number in the carnivorous plant genus Genlisea (Lentibulariaceae), with a new estimate of the minimum genome size in angiosperms. Annals Bot 2014. mcu189. 16. Leushkin EV, Sutormin RA, Nabieva ER, Penin AA, Kondrashov AS, Logacheva MD: The miniature genome of a carnivorous plant Genlisea aurea contains a low number of genes and short noncoding sequences. BMC Genomics 2013, 14:476. 17. Ibarra-Laclette E, Lyons E, Herna´ndez-Guzma´n G, Pe´rezTorres CA, Carretero-Paulet L, Chang T-H, Lan T, Welch AJ,  Jua´rez MJA, Simpson J: Architecture and evolution of a minute plant genome. Nature 2013, 498:94-98. The compact bladderwort genome provides evidence that almost all intragenic space and repeat sequences can be purged. 18. Wang W, Haberer G, Gundlach H, Gla¨ßer C, Nussbaumer T, Luo M, Lomsadze A, Borodovsky M, Kerstetter R, Shanklin J: The Spirodela polyrhiza genome reveals insights into its neotenous reduction fast growth and aquatic lifestyle. Nat Commun 2014:5. 19. Banks JA, Nishiyama T, Hasebe M, Bowman JL, Gribskov M, Albert VA, Aono N, Aoyama T, Ambrose BA, Ashton NW: The Selaginella genome identifies genetic changes associated with the evolution of vascular plants. Science 2011, 332:960963. 20. Albert VA, Barbazuk WB, Der JP, Leebens-Mack J, Ma H,  Palmer JD, Rounsley S, Sankoff D, Schuster SC, Soltis DE: The Amborella genome and the evolution of flowering plants. Science 2013, 342:1241089. Amborella is the most basal flowering plant and serves as a powerful reference for comparative genomics. The amborella genome provided evidence for an ancient WGD that predated all flowering plants. 21. Metzker ML: Sequencing technologies — the next generation. Nat Rev Genet 2009, 11:31-46. 22. Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci 2011, 108:1513-1518. 23. Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, GarciaHernandez M: The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res 2012, 40:D1202-D1210. 24. Michael TP: Plant genome size variation: bloating and purging DNA. Briefings Funct Genomics 2014. elu005. 25. Schatz MC, Maron LG, Stein JC, Wences AH, Gurtowski J,  Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E: Whole genome de novo assemblies of three divergent strains of rice, Current Opinion in Plant Biology 2015, 24:71–81

Oryza sativa, document novel gene space of aus and indica. Genome Biol 2014, 15:506. The authors produce de novo sequences of three diverse rice strains and uncover several Mb of sequences unique to each strain that could not have been identified using a standard resequencing approach. 26. Denoeud F, Carretero-Paulet L, Dereeper A, Droc G, Guyot R, Pietrella M, Zheng C, Alberti A, Anthony F, Aprea G: The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 2014, 345:1181-1184. 27. Ling H-Q, Zhao S, Liu D, Wang J, Sun H, Zhang C, Fan H, Li D, Dong L, Tao Y: Draft genome of the wheat A-genome progenitor Triticum urartu. Nature 2013, 496:87-90. 28. Shulaev V, Sargent DJ, Crowhurst RN, Mockler TC, Folkerts O, Delcher AL, Jaiswal P, Mockaitis K, Liston A, Mane SP: The genome of woodland strawberry (Fragaria vesca). Nat Genet 2011, 43:109-116. 29. Wu GA, Prochnik S, Jenkins J, Salse J, Hellsten U, Murat F, Perrier X, Ruiz M, Scalabrin S, Terol J: Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat Biotechnol 2014, 32:656-662. 30. Xu Q, Chen L-L, Ruan X, Chen D, Zhu A, Chen C, Bertrand D, Jiao W-B, Hao B-H, Lyon MP: The draft genome of sweet orange (Citrus sinensis). Nat Genetics 2013, 45:59-66. 31. D’Hont A, Denoeud F, Aury J-M, Baurens F-C, Carreel F, Garsmeur O, Noel B, Bocs S, Droc G, Rouard M: The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 2012, 488:213-217. 32. Verde I, Abbott AG, Scalabrin S, Jung S, Shu S, Marroni F, Zhebentyayeva T, Dettori MT, Grimwood J, Cattonaro F: The highquality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genetics 2013, 45:487-494. 33. Wu J, Wang Z, Shi Z, Zhang S, Ming R, Zhu S, Khan MA, Tao S, Korban SS, Wang H: The genome of the pear (Pyrus bretschneideri Rehd.). Genome Res 2013, 23:396-408. 34. Shaver JM, Oldenburg DJ, Bendich AJ: Changes in chloroplast DNA during development in tobacco, Medicago truncatula, pea, and maize. Planta 2006, 224:72-82. 35. Lutz KA, Wang W, Zdepski A, Michael TP: Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing. BMC Biotechnol 2011, 11:54. 36. Tilgner H, Grubert F, Sharon D, Snyder MP: Defining a personal,  allele-specific, and single-molecule long-read transcriptome. Proc Natl Acad Sci 2014, 111:9869-9874. The authors use single-molecule long-reads to assemble a haplotype specific transcriptome rich with gene isoforms which has broad applications for crop genomics. 37. Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P, Wang W, Ly BV, Lewis KL: The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 2008, 452:991-996. 38. Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, Lucas WJ, Wang X, Xie B, Ni P: The genome of the cucumber, Cucumis sativus L. Nat Genetics 2009, 41:1275-1281. 39. Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, Fontana P, Bhatnagar SK, Troggio M, Pruss D: The genome of the domesticated apple (Malus [times] domestica Borkh.). Nat Genetics 2010, 42:833-839. 40. Argout X, Salse J, Aury J-M, Guiltinan MJ, Droc G, Gouzy J, Allegre M, Chaparro C, Legavre T, Maximova SN: The genome of Theobroma cacao. Nat Genetics 2011, 43:101-108. 41. Al-Dous EK, George B, Al-Mahmoud ME, Al-Jaber MY, Wang H, Salameh YM, Al-Azwani EK, Chaluvadi S, Pontaroli AC, DeBarry J: De novo genome sequencing and comparative genomics of date palm (Phoenix dactylifera). Nat Biotechnol 2011, 29:521527.

www.sciencedirect.com

Progress, challenges and the future of crop genomes Michael and VanBuren 81

42. Guo S, Zhang J, Sun H, Salse J, Lucas WJ, Zhang H, Zheng Y, Mao L, Ren Y, Wang Z: The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions. Nat Genetics 2013, 45:51-58. 43. Consortium PGS: Genome sequence and analysis of the tuber crop potato. Nature 2011, 475:189-195. 44. Lin T, Zhu G, Zhang J, Xu X, Yu Q, Zheng Z, Zhang Z, Lun Y, Li S,  Wang X: Genomic analyses provide insights into the history of tomato breeding. Nat Genetics 2014, 46:1220-1226. The authors resequence 360 wild and cultivated tomato accessions and uncover two independent sets of QTLs that increased fruit size 100x compared to wild tomatoes. 45. Qin C, Yu C, Shen Y, Fang X, Chen L, Min J, Cheng J, Zhao S, Xu M, Luo Y: Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization. Proc Natl Acad Sci 2014, 111:5135-5140. 46. Hirakawa H, Shirasawa K, Miyatake K, Nunome T, Negoro S, Ohyama A, Yamaguchi H, Sato S, Isobe S, Tabata S: Draft genome sequence of eggplant (Solanum melongena L.): the representative solanum species indigenous to the old world. DNA Res 2014. dsu027. 47. Elert E: Rice by the numbers: a good grain. Nature 2014, 514:S50-S51. 48. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A: The Sorghum bicolor genome and the diversification of grasses. Nature 2009, 457:551-556. 49. Consortium IBGS: A physical, genetic and functional sequence assembly of the barley genome. Nature 2012, 491:711-716. 50. Chalhoub B, Denoeud F, Liu S, Parkin IA, Tang H, Wang X,  Chiquet J, Belcram H, Tong C, Samans B: Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 2014, 345:950-953. The authors used sequences from the diploid parental species to separate scaffolds by subgenome in the allopolyploid Brassica napus genome, providing an unprecedented look at the early events of polyploidy. 51. Mayer KF, Rogers J, Dolezˇel J, Pozniak C, Eversole K, Feuillet C,  Gill B, Friebe B, Lukaszewski AJ, Sourdille P: A chromosomebased draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 2014, 345:1251788. The authors use flow cytometry to separate individual chromosomes for sequencing the hexaploid wheat genome unveiling emergent properties related to grain composition. 52. Pfeifer M, Kugler KG, Sandve SR, Zhan B, Rudi H, Hvidsten TR, Mayer KF, Olsen O-A: Genome interplay in the grain transcriptome of hexaploid bread wheat. Science 2014, 345:1250091.

polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 2012, 492:423-427. 55. Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, Grimwood J, Jenkins J, Lindquist E, Tice H, Bauer D: The genome of Eucalyptus grandis. Nature 2014, 510:356-362. 56. Birol I, Raymond A, Jackman SD, Pleasance S, Coope R, Taylor GA, Saint Yuen MM, Keeling CI, Brand D, Vandervalk BP: Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics 2013. btt178. 57. Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y-C, Scofield DG, Vezzi F, Delhomme N, Giacomello S, Alexeyenko A: The Norway spruce genome sequence and conifer genome evolution. Nature 2013, 497:579-584. 58. Resende MF, Mun˜oz P, Resende MD, Garrick DJ, Fernando RL, Davis JM, Jokela EJ, Martin TA, Peter GF, Kirst M: Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.). Genetics 2012, 190:1503-1510. 59. Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA, Donoghue MT, Azam S, Fan G, Whaley AM: Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat Biotechnol 2012, 30:83-89. 60. Wang W, Feng B, Xiao J, Xia Z, Zhou X, Li P, Zhang W, Wang Y, Møller BL, Zhang P: Cassava genome from a wild ancestor to cultivated varieties. Nat Commun 2014:5. 61. Cannarozzi G, Plaza-Wu¨thrich S, Esfeld K, Larti S, Wilson YS, Girma D, de Castro E, Chanyalew S, Blo¨sch R, Farinelli L: Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef). BMC Genomics 2014, 15:581. 62. McCouch S, Baute GJ, Bradeen J, Bramel P, Bretting PK, Buckler E, Burke JM, Charest D, Cloutier S, Cole G: Agriculture: feeding the future. Nature 2013, 499:23-24. 63. Kim MY, Lee S, Van K, Kim T-H, Jeong S-C, Choi I-Y, Kim D-S, Lee Y-S, Park D, Ma J: Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome. Proc Natl Acad Sci 2010, 107:22032-22037. 64. Li Y, Zhao S, Ma J, Li D, Yan L, Li J, Qi X-t, Guo X-s, Zhang L, He Wm: Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing. BMC Genomics 2013, 14:579. 65. Li Y-h, Zhou G, Ma J, Jiang W, Jin L-g, Zhang Z, Guo Y, Zhang J, Sui Y, Zheng L: De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat Biotechnol 2014, 32:1045-1052.

53. Jiang C-X, Wright RJ, El-Zik KM, Paterson AH: Polyploid formation created unique avenues for response to selection in Gossypium (cotton). Proc Natl Acad Sci 1998, 95:4419-4424.

66. Lane AK, Niederhuth CE, Ji L, Schmitz RJ: pENCODE: a plant  encyclopedia of DNA elements. Annu Rev Genet 2014:48. This review provides a detailed explanation of the methods and utility of ENCODE projects and the future of crop genomics.

54. Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, Llewellyn D, Showmaker KC, Shu S, Udall J: Repeated

67. Shan Q, Wang Y, Li J, Gao C: Genome editing in rice and wheat using the CRISPR/Cas system. Nat Protocols 2014, 9:2395-2410.

www.sciencedirect.com

Current Opinion in Plant Biology 2015, 24:71–81

Progress, challenges and the future of crop genomes.

The availability of plant reference genomes has ushered in a new era of crop genomics. More than 100 plant genomes have been sequenced since 2000, 63%...
1MB Sizes 0 Downloads 4 Views