Transcriptome profiling reveals mosaic genomic origins of modern cultivated barley Fei Daia, Zhong-Hua Chena,b, Xiaolei Wanga, Zefeng Lia, Gulei Jina, Dezhi Wua, Shengguan Caia, Ning Wangc, Feibo Wua, Eviatar Nevod,1, and Guoping Zhanga,1 a Department of Agronomy, Zhejiang Key Laboratory of Crop Germplasm, Zhejiang University, Hangzhou 310058, China; bSchool of Science and Health, University of Western Sydney, Richmond, NSW 2753, Australia; cFaculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan; and dInstitute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel

The domestication of cultivated barley has been used as a model system for studying the origins and early spread of agrarian culture. Our previous results indicated that the Tibetan Plateau and its vicinity is one of the centers of domestication of cultivated barley. Here we reveal multiple origins of domesticated barley using transcriptome profiling of cultivated and wild-barley genotypes. Approximately 48-Gb of clean transcript sequences in 12 Hordeum spontaneum and 9 Hordeum vulgare accessions were generated. We reported 12,530 de novo assembled transcripts in all of the 21 samples. Population structure analysis showed that Tibetan hulless barley (qingke) might have existed in the early stage of domestication. Based on the large number of unique genomic regions showing the similarity between cultivated and wildbarley groups, we propose that the genomic origin of modern cultivated barley is derived from wild-barley genotypes in the Fertile Crescent (mainly in chromosomes 1H, 2H, and 3H) and Tibet (mainly in chromosomes 4H, 5H, 6H, and 7H). This study indicates that the domestication of barley may have occurred over time in geographically distinct regions.

|

evolution genetic diversity single nucleotide variants

| genomic similarity | RNA-Seq |

D

omestication of crops is the outcome of complex independent or combined processes of artificial and natural selection that lead to plants adaptive to cultivation and human consumption (1, 2). Wild barley (Hordeum spontaneum L.), the progenitor of cultivated barley (H. vulgare L.), is one of the founder crops of the Old World for Neolithic food production (3), and it harbors a myriad of mutations favorable for its adaptation to harsh environments. Hence, wild barley could provide natural sources of genetic diversity for plant abiotic and biotic stress tolerance (3, 4). Understanding the domestication process of cultivated barley will therefore be helpful for exploiting elite genetic resources in wild barley and breaking the current bottleneck in modern barley breeding caused by narrower genetic diversity (1, 4). The Near East Fertile Crescent is commonly recognized as a major evolutionary center of wild barley and domestication of cultivated forms (3–5). Many reports indicate that cultivated barley was first domesticated about 10,000 y ago in the Near East Fertile Crescent (3, 5). However, the debate on the monophyletic or polyphyletic origin of barley still remains contentious (1, 5–9). Morrell and Clegg, based on the resequencing data from 18 loci, proposed that barley has been domesticated at least twice, once in the Near East Fertile Crescent and then 1,500– 3,000 km farther east (7). Genotyping of chloroplast microsatellite markers has also suggested that barley has been domesticated more than once, on each occasion in a different geographical region (10). Unlike wheat and other founder crops, the natural distribution of wild barley scattered widely from the Near East to Central Asia and the Tibetan Plateau (5, 11, 12). There is increasing evidence to support the theory that cultivated barley is of polyphyletic origin (7, 9, 13, 14). www.pnas.org/cgi/doi/10.1073/pnas.1414335111

Barley is an annual diploid grass species with a large haploid genome of 5.1-Gb and a high abundance of repetitive elements (15). RNA sequencing (RNA-Seq) is a high-throughput technology for transcriptome profiling using deep-sequencing protocol for rapid characterization of transcript sequences and gene expression (16). This technique is effective for detecting not only differentially expressed genes, but also sequence variants and new transcripts (17). Recent advances in the characterization and quantification of transcriptome with RNA-Seq have been made in rice (18), maize (19), and barley (20). In view of the large genome of barley and the wide genetic diversity of wild barley (4, 14, 21), RNA-Seq has become a very effective and powerful technology in generating comprehensive transcriptome profiles (22). Our previous studies have shown significant genetic differences between wild barley from the Near East and Tibet (14). However, no research has been conducted using comparative transcriptomics to distinguish between cultivated barley and different wild-barley populations. We hypothesized that the genome segments of cultivated barley should show certain similarity with its ancestral wild barley, and using RNA-Seq we should be able to determine the genomic origin of cultivated barley. Specifically, we selected some representative wild-barley genotypes from the Near East and Tibet, and representative world-wide selections of cultivated barley genotypes, conducted Significance Archaeological, historic, and molecular studies have indicated that the Fertile Crescent in the Near East is a major center of origin and domestication of cultivated barley. However, growing evidence in recent years supports the theory of a polyphyletic origin of barley. To our knowledge, this is the first attempt using RNA sequencing techniques and genomic similarity analysis to study the domestication of barley. We confirmed the polyphyletic origin of cultivated barley and our results revealed the genomic origin of modern cultivated barley is from both the Near East and Tibet, with different contributions on each chromosome. These findings advance our understanding of the early spread of agrarian culture, and are expected to be of wide interest to evolutionary geneticists and plant breeders. Author contributions: F.D., E.N., and G.Z. designed research; F.D., X.W., G.J., D.W., S.C., and F.W. performed research; F.D., Z.-H.C., X.W., Z.L., G.J., D.W., and N.W. analyzed data; and F.D., Z.-H.C., E.N., and G.Z. wrote the paper. The authors declare no conflict of interest. Freely available online through the PNAS open access option. Data deposition: The sequences reported in this paper have been deposited in the National Center for Biotechnology Information Sequence Read Archive, www.ncbi.nlm.nih. gov (accession nos. SAMN02483491–SAMN02483511). 1

To whom correspondence may be addressed. Email: [email protected] or [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1414335111/-/DCSupplemental.

PNAS | September 16, 2014 | vol. 111 | no. 37 | 13403–13408

EVOLUTION

Contributed by Eviatar Nevo, July 29, 2014 (sent for review March 2, 2014)

their transcriptome profiling, and investigated the genomic origin of modern cultivated barley. Results RNA-Seq Performance and Data Analysis. We performed an RNASeq analysis on the samples collected at the seedling stage of 12 wild (Table S1) and 9 cultivated barley genotypes (Table S2) using the Illumina HiSeq 2000 platform. Overall, paired-end sequencing (100 bp) of transcriptome yielded over 635 million raw reads (59.16 Gb) for the 21 libraries (Table S3). After removing all of the adaptor sequences, empty reads, and lowquality reads, there were around 534 million clean reads remaining (Table S3), with a mean of 2.63 and 1.82 Gb for wild and cultivated barley, respectively. Approximately 81.9–85.5% of the clean reads were mapped to the current whole-genome shotgun (WGS) contigs of cv. Morex (15), yielding 120,837 transcripts with a mean length of 2,336 bp (Table S3). In total, 61,611 genes were identified for both wild and cultivated barley, in which 16,993 and 9,443 transcripts were unique for wild and cultivated barley, respectively. The remaining reads were aligned to the barley full-length cDNA (fl-cDNA) (23), resulting in 3,861 transcripts, and 3,276 genes in the 21 samples. To discover previously unrecognized transcripts, unmapped RNA-Seq reads were de novo assembled, resulting in the assembly of 12,530 de novo transcripts (Fig. S1A). Around 74.0% of the de novo assembled transcripts were found in all of the samples. However, only 18.6% of these transcripts were detected in wild-barley accessions from the Near East and Tibet (Fig. S1A). These results indicated that wild barley has a much larger transcript diversity than cultivated barley. Discovery of Exon Single Nucleotide Variants and Indels and Their Mapping to the Synthetic Assembly of the Barley Genome. Single

nucleotide variants (SNVs) and indel detection in exons were performed using the dataset referring to the current WGS contigs of cv. Morex and its assembly. After aligning the reads in each sample against the WGS contigs of cv. Morex, we identified a total of 247,611 SNVs and 9,084 indels in 31,084 genes, including 191,534 SNVs and 6,744 indels in 20,682 genes mapped to seven barley chromosomes (Table S4). A large number of SNVs were detected in wild barley from the Near East, ranging from 46,538–58,502, and in wild barley from Tibet, ranging from 38,704–52,007 (Table S3). In contrast, there were on average only 27,968 SNVs in cultivated barley, indicating the gene pool of cultivated barley is dramatically smaller than that of wild barley. Moreover, we built a synthetic assembly of the barley genome based on the WGS contigs assembly of cv. Morex, Baker, and Bowman. The 1.7-Gb (1,334,625 contigs) synthetic assembly of the barley genome (Dataset S1), which includes approximately one-third of the barley genome, has provided a detailed insight into the physical distribution of genes. By using the five accessions (B1K-04-12, Bowman, Barke, Igri, and Haruna Nijo) reported by the International Barley Genome Sequencing Consortium (15), we were able to analyze the genomic contribution of different wild-barley populations to the modern cultivars. In total, 111,121 SNVs with no missing data were detected in all of the 26 samples (Table S4), including 4,099 SNVs showing multiple variations in a single site. As the missing and multiple variation sites could make subsequently unreliable inferences in any sample, we only used 107,022 SNVs with no missing data in the 26 samples to assess the unique SNVs in cultivated barley (Fig. S1B) and in the two main wild-barley populations. Surprisingly, there were only 16,977 SNVs detected in all of the cultivated and wild barley in this study (Fig. S1B). The numbers of unique SNVs for wild barley from the Near East and Tibet were 37,998 and 14,724, respectively. However, the 13 cultivated barley genotypes only harbored 12,163 unique SNVs (Fig. S1B). 13404 | www.pnas.org/cgi/doi/10.1073/pnas.1414335111

Furthermore, we created a dataset containing 92,776 SNVs with no missing data, which were anchored to the synthetic assembly of the barley genome in all of the 26 genotypes for the population structure analysis and genomic similarity test. All of these SNVs were randomly distributed on the seven chromosomes, with a mean of 13,254 SNVs or 1,678 genes in each chromosome (Table S4). Population Structure and Grouping of Cultivated and Wild Barley. We constructed a neighbor-joining tree and conducted a population structure analysis and a principle component analysis (PCA) (Fig. 1) based on the dataset of 92,776 SNVs with no missing data (Table S4). Three major groups could be clearly observed from the neighbor-joining tree: the cultivars except Tibetan hulless barley (qingke), the wild barley from the Near East, and the wild barley and qingke from the Tibetan Plateau (Fig. 1A). According to the first and second eigenvectors of the PCA, all of the samples could be divided into cultivars and wild-barley groups, with the exception of qingke, which demonstrated a very close link to the Tibetan wild barley (Fig. 1B). Again, the results showed the cultivars had much less genetic diversity than the wild genotypes. Furthermore, we performed a population structure analysis to estimate individual ancestry and admixture proportions, assuming the existence of certain populations. By analyzing the number of clusters (K), we found a clear evolutionary divergence between cultivated and wild barley from the Near East and Tibet with a K = 3 (Fig. 1C). Obviously, qingke was attributed to the population of Tibetan wild barley. When K = 4, the cultivars from East Asia fell right into a new subgroup within the cultivated barley population (Fig. 1C). With K values at 5 and 6, some Israel and Tibet wild barley could be separated, respectively. Based on the results of population structure inference, we divided all of the wild-barley accessions into three groups: the wild barley from Tibet (Wb-T, including XZ2, XZ12, XZ15, XZ21, XZ174, and XZ181), the wild barley from the Near East group 1 (Wb-NE1, including ECI-6-0, Tabigha-T-0, and Tabigha-B-63), and the wild barley from the Near East group 2 (Wb-NE2, including ECI-2-0, Iran-6-26, Turkey-19-24, and B1K-04-12). All of the cultivated barley—except qingke—were classified into one group (Cultivar) to minimize the effect of genetic drift, which may be considered as a representative of modern cultivated barley worldwide. Genomic Origin of Modern Cultivated Barley. We used the same dataset containing 92,776 SNVs with no missing data to analyze genomic similarity between modern cultivated barley (Cultivar) and Wb-T, Wb-NE1, and Wb-NE2, respectively. The average heterozygosity of Cultivar, Wb-NE1, Wb-NE2, and Wb-T was 0.135, 0.091, 0.198, and 0.161, respectively. To avoid the occurrence of multivariants in one site, we combined each group of barley accessions as a gene pool according to the methods described by Rubin et al. (24). We used 300-kb windows and 150-kb overlapping slide windows to detect the potential similarity of genomic regions, resulting in the best coverage along the synthetic assembly genome of barley between Cultivar and three wild-barley groups. As a result, 600 windows with high similarity were detected in a total of 21,880 SNVs fell into the selection criteria, which account for ∼8.7% of the synthetic assembly of the barley genome. Among these windows, only 51 similar windows were detected between Cultivar and Wb-NE1. However, the number increased dramatically to 289 and 384 between Cultivar and Wb-NE2, and Wb-T, respectively. Meanwhile, a tight relationship between Wb-NE2 and Wb-T was found based on a total of 298 similar windows (Fig. S2). However, the number of similar windows between Wb-NE1 and Wb-NE2 and between Wb-NE1 and Wb-T is only 101 and 31, respectively. These results indicate that the domestication process might have happened in a certain population of wild barley from the Fertile Crescent. Dai et al.

Him ala2

C g5

K=2

100

100

100

100

XZ174

0 100

XZ12

10000

10

0

Igri

100

100

in

10

Esterel

iq Be

XZ 15

ijo

en g

Harun a Nijo

iN Amag

Yi wu erl

25 94 TX

Fr an ka

Padanggamu

A

100

100

100

XZ2

10

0

n ma Bow

100 100

100

0

10

0

XZ 21

100

Alex is

e

rk Ba

10

K=3

1 18 XZ

B1

100

2 4-1 K-0

Ira n6-

26

100

24 9-1 ey rk Tu

-0 -T ha big Ta

Tabigha-B-63

ECI-2-0

ECI-6-0

K=4

B

EVOLUTION

0.05

K=5

K=6

Fig. 1. Phylogenetic tree, PCA, and population structure analysis of wild and cultivated barley. The analyses were based on 92,776 SNVs randomly distributed in the seven barley chromosomes. (A) Phylogenetic tree was constructed using the neighbor-joining methods with 1,000 bootstraps. (B) PCA1, the first principal component; PCA2, the second principal component; green circles indicate barley cultivars (except qingke), which are referred as a green branch in A. (C) Each color denotes one population in the population structure analysis. Each vertical bar represents one accession in which the percentages of contribution from the ancestral populations are indicated by the lengths of colored segments. The number of clusters (K) was set from 2 to 6.

Moreover, the Circos diagrams demonstrated a tightly genomic relationship between cultivated barley and wild-barley groups (Fig. 2A). There were only 20 unique genomic regions between Cultivar and Wb-NE1 (Table S5). In contrast, 212 and 300 unique genomic regions were detected between Cultivar and Wb-NE2 and between Cultivar and Wb-T, respectively, showing a mosaic distribution in each of the chromosomes (Fig. 2A). Meanwhile, the length of unique genomic regions was summarized Dai et al.

and the genomic similarity rate of Cultivar to each wild group was calculated based on the length of unique genomic regions (Fig. 2B). The high genomic similarity rate between Cultivar and Wb-NE2 at 59.5%, 51.9%, and 49.6% is found in chromosomes 1H, 2H, and 3H, respectively. In contrast, the high genomic similarity rate between Cultivar and Wb-T at 55.1%, 65.3%, 49.1%, and 55.2% is detected only in chromosomes 4H, 5H, 6H, and 7H, respectively. PNAS | September 16, 2014 | vol. 111 | no. 37 | 13405

A

B Chromosome

1H

2H

Wb-T

33.8 30.2 36.4 55.1 65.3 49.1 55.2 48.8

Wb-NE1

0.0

Wb-NE2

59.5 51.9 49.6 29.6 24.4 32.5 26.5 36.1

4.7

3H

2.5

4H

7.1

5H

1.1

6H

4.1

7H Total

4.9

3.6

Fig. 2. Genomic similarity between cultivated and wild barley. (A) Circos diagram shows the seven chromosomes (1H∼7H) of barley in each of the four groups: green, cultivated barley (Cultivar); yellow, wild barley from the Near East group 1 (Wb-NE1); blue, wild barley from the Near East group 2 (Wb-NE2); red, the wild barley from Tibet (Wb-T). The number on each chromosome indicates genomic position on the synthetic assembly of the barley genome (Mb). The similar blocks are connected with lines and each line represents one unique window (300 kb) of the genome with highest similarity between Cultivar and the three wild-barley groups. (B) Genomic similarities between Cultivar and three wild-barley groups based on the total length of unique windows, excluding the overlap of slide windows.

To validate the above results, we also conducted a case study on five additional cultivars (Cultivar-additional, cv. Borwina, Kindred, Vogelsanger Gold, Steptoe and Harrington) containing exome capture data (25). After aligning the reads in each sample against the WGS contigs of cv. Morex, and matching with our existing dataset of 92,776 SNVs, 54,290 SNVs were detected with no missing data in all of the five additional cultivars. Cluster analysis, using 54,290 SNVs, produced a neighbor-joining tree (Fig. S3A) similar to Fig. 1A. Therefore, we conducted a genomic similarity analysis between Cultivar-additional and each wild-barley group. A total of 583 windows with high similarity were detected, where 277 and 392 similar windows were found between Cultivar-additional and Wb-NE2, and Wb-T, respectively. Consistent with Fig. 2A, the unique genomic regions in the Circos diagrams also showed a mosaic distribution in each chromosome (Fig. S3B). Discussion Genome sequencing has provided many new insights into the domestication of barley. However, because of the large genome of barley (14), RNA-Seq has quickly become an alternative and effective method to study the domestication of barley. We have 13406 | www.pnas.org/cgi/doi/10.1073/pnas.1414335111

generated comprehensive transcriptome profiles for the in exon detection of genetic diversity of wild and cultivated barley (Tables S1–S3). The results show that only 42.2% of the SNVs were detected in 13 representative cultivated barley accessions from Tibet, East Asia, and Europe (Fig. S1B), including 11.3% SNVs unique to the cultivated barley; some of them might be false-positives caused by missing data of wild barley. Hence, we suggest that the gene pool of the cultivated barley is derived from wild barley from both the Near East and Tibet. This suggestion is strongly supported by the fact that 8,177 (7.6%) and 7,913 (7.4%) of SNVs detected in cultivated barley were originated from the wild barley from the Near East and Tibet, respectively (Fig. S1B). In contrast, we identified 57.8% SNVs unique to 13 wild-barley accessions (Fig. S1B), indicating that more than half of the genetic diversity in exon has been lost during barley domestication or modern breeding. This finding also indicates that wild barley has a much larger gene pool than cultivated barley and contains more potentially useful genetic resources for barley improvement (4, 14, 21). Moreover, some genetic studies have demonstrated that wild barley consists of several genetically differentiated populations (3, 4). Our results showed the wild-barley accessions from Iran and Turkey were closely related to the Tibetan wild barley, whereas some Israeli wild-barley accessions showed large genetic and evolutionary distances from the rest of wild barley (Fig. 1 A and B). Based on the results of population structure, we divided all of the wild-barley accessions into three groups: Wb-T accessions from the Tibetan Plateau and its vicinity (11, 12); Wb-NE1 accessions from Israel (26–28); and four Wb-NE2 accessions: B1K-04-12 (21) and ECI-2-0 (27) from Israel, Iran-6-26 from Gawdar, Iran (29), and Turkey-19-24 from Bitlis, Turkey (30) (Table S1). The results suggested that the wild barley from the Near East have wide genetic diversity (21, 26–30), which could be divided into two or more subgroups, and are also consistent with the report that incipient sympatric speciation might begin to occur between the two opposite slopes of “Evolution Canyon” at Mount Carmel, Israel, despite their geographical proximity (31). Domestication of crops and expansion of agriculture has fundamentally reshaped the course of human history and many other living organisms (1, 32–34). The introduction and establishment of barley cultivation into the Plateau has provided a staple food for the livelihood of pioneer Tibetan settlers and for the subsequent rapid population growth on the Tibetan Plateau (35). Hulless barley (qingke) has been inferred with a single origin of domestication according to the nud gene (36). Our results show that qingke is different from modern cultivated barley at genomic level (Fig. 1 A and B), but has a close genetic relationship with wild barley, especially the Tibetan wild barley (Fig. 1B). Thus, we propose that qingke probably existed in an early stage of domestication. However, the genomic similarity between modern cultivated barley and qingke cannot be assessed efficiently using the current data because of limited samples. It is also difficult to exclude the possibility of gene flow between altitude-adapted Tibetan wild barley and introduced domesticated barley, which might have allowed early Tibetan farmers to develop varieties that were well adapted to local climatic conditions in the Plateau (37). Thus, qingke was not included in genomic similarity analysis. It has been reported that about half of the sequenced loci from wild barley exhibit significant differentiation between the eastern and western portion of the species (7, 38). Based on the released WGS contigs assembly of cv. Morex, Baker, and Bowman (15), and the synthetic assembly of the barley genome (Dataset S1), we were able to estimate the genomic contribution of different wild-barley populations to the representatives of modern cultivars (Tables S1 and S2). Interestingly, the origin of the cultivated barley genome showed much higher portions from chromosomes 1H–3H in Wb-NE2, and from chromosomes 4H–7H in Wb-T, Dai et al.

Materials and Methods Barley Plants. We used 12 wild (H. spontaneum) (Table S1) and 9 cultivated barley plants (Table S2) to conduct an RNA-seq analysis. Six wild-barley accessions from the Near East were collected and previously characterized by Nevo et al. (26–28). Six wild-barley genotypes were from the collection of Xu, since the 1960s, from the extensive area of the Tibetan Plateau (11, 12), and evaluated by Dai et al. (14, 47) and Qiu et al. (48). Among the cultivated barley genotypes used in this study, Padanggamu, Beiqing5, and Himala2 were hulless, also called qingke, and cultivated on the Tibetan Plateau and its vicinity. Three cultivars from East Asia and three cultivars from Europe were reported in our previous work (14). Moreover, for comparative studies, one wild barley (B1K-04-12) and four cultivated barley (Bowman, Barke, Igri, and Haruna Nijo) with WGS sequence were selected as references, which

1. Brown TA, Jones MK, Powell W, Allaby RG (2009) The complex origins of domesticated crops in the Fertile Crescent. Trends Ecol Evol 24(2):103–109. 2. Allaby RG, Fuller DQ, Brown TA (2008) The genetic expectations of a protracted model for the origins of domesticated crops. Proc Natl Acad Sci USA 105(37): 13982–13986. 3. Zohary D, Hopf M, Weiss E (2012) Domestication of Plants in the Old World: The Origin and Spread of Domesticated Plants in Southwest Asia, Europe, and the Mediterranean Basin (Oxford Univ Press, Oxford). 4. Nevo E (2006) Genome evolution of wild cereal diversity and prospects for crop improvement. Plant Genet Resour 4(1):36–46. 5. Harlan JR, Zohary D (1966) Distribution of wild wheats and barley. Science 153(3740): 1074–1080. 6. Zohary D (1999) Monophyletic vs. polyphyletic origin of the crops on which agriculture was founded in the Near East. Genet Resour Crop Evol 46(2):133–142. 7. Morrell PL, Clegg MT (2007) Genetic evidence for a second domestication of barley (Hordeum vulgare) east of the Fertile Crescent. Proc Natl Acad Sci USA 104(9): 3289–3294. 8. Nevo E, Korol AB, Beiles A, Fahima T (2002) Evolution of Wild Emmer and Wheat Improvement. Population Genetics, Genetic Resources, and Genome Organization of Wheat’s Progenitor, Triticum dicoccoides (Springer, Berlin). 9. Azhaguvel P, Komatsuda T (2007) A phylogenetic analysis based on nucleotide sequence of a marker linked to the brittle rachis locus indicates a diphyletic origin of barley. Ann Bot (Lond) 100(5):1009–1015. 10. Molina-Cano JL, et al. (2005) Chloroplast DNA microsatellite analysis supports a polyphyletic origin for barley. Theor Appl Genet 110(4):613–619.

Dai et al.

have been investigated by the International Barley Genome Sequencing Consortium (15) (Tables S1 and S2). Phylogenetic, Population Structure, and PCA Analysis. A dataset of 92,776 SNVs with no missing data anchored to the synthetic assembly of the barley genome was used to conduct phylogenetic analysis, population structure analysis, and PCA. The phylogenetic tree of the 26 accessions was constructed using MEGA 5.1 program (49) with neighbor-joining methods (1,000 bootstraps). FRAPPE (50) was used to investigate the population structure based on a maximum-likelihood method, with 10,000 iterations, and the number of clusters (K) was set from 2 to 6. Moreover, we performed a PCA analysis using EIGENSOFT (51). Genomic Similarity Analysis. The same dataset was used for genomic similarity analysis as an addition to the phylogenetic analysis. Each type of SNV with a known site referring to the synthetic assembly of the barley genome was counted for four groups: Cultivar, Wb-T, Wb-NE1, and Wb-NE2. Therefore, barley gene pools for the four groups were constructed according to Rubin et al. (24). Briefly, for all sites with two or more variants, the minority variants were treated as errors rather than using the reference base. If the variants in a certain site had the same frequency, they were selected randomly. Finally, we constructed four (Cultivar, Wb-T, Wb-NE1, and Wb-NE2) barley gene pools, where all of the SNVs were linearly distributed on the seven chromosomes of the synthetic assembly of the barley genome. Moreover, average heterozygosity for the four barley groups was estimated according to Nei (52). To maximize the genomic coverage and maintain the high accuracy, we used 300-kb windows and 150-kb overlapping slide windows along the synthetic assembly of the barley genome to study the genomic similarity. Large windows usually have poor coverage and small windows do not contain enough SNVs, leading to inaccuracy in the similarity analysis. The numbers of SNVs were counted for each window, and the windows were removed with the number of SNVs ≤ 20. We selected the windows with no missing data in all of the three pairs between Cultivar and Wb-T, Wb-NE1, or Wb-NE2 along the synthetic assembly of the barley genome for further analysis. The number of identical SNVs in each window was counted in all of the three pairs. Then, the similarity of each window was calculated in all of the three pairs, which was the number of identical SNVs divided by the total number of SNVs in each window. The windows were kept with similarity ≥95%. Finally, unique windows with the highest similarity among three pairs were selected for the visualization of synthetic relationships using Circos (53). Additional experimental details on sample and cDNA library preparation, deep sequencing, read mapping and de novo assembly, and SNVs or indel calling can be found in SI Materials and Methods. ACKNOWLEDGMENTS. We thank Hangzhou Guhe Information and Technology Co., Ltd, for help in sequencing and bioinformatics analysis, Profs. A. Beiles and A. Beattie for critical reading of the manuscript, and Prof. D. F. Sun for providing seeds of Tibetan wild barley. This study was supported by the Natural Science Foundation of China (31330055, 31201166, 31301246, and 31171544).

11. Xu TW (1975) On the origin and phylogeny of cultivated barley with preference to the discovery of Ganze wild two-rowed barley Hordeum spontaneum c. Koch. Acta Genet Sin 2(2):129–137. 12. Xu TW (1982) Origin and evolution of cultivated barley in China. Acta Genet Sin 9:440–446. 13. von Bothmer R, Komatsuda T (2011) in Barley: Production, Improvement, and Uses, ed Ullrich SE (Wiley-Blackwell, Chichester, UK), pp 14–62. 14. Dai F, et al. (2012) Tibet is one of the centers of domestication of cultivated barley. Proc Natl Acad Sci USA 109(42):16969–16973. 15. Mayer KF, et al.; International Barley Genome Sequencing Consortium (2012) A physical, genetic and functional sequence assembly of the barley genome. Nature 491(7426):711–716. 16. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: A revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63. 17. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18(9):1509–1517. 18. Mizuno H, et al. (2010) Massive parallel sequencing of mRNA in identification of unannotated salinity stress-inducible transcripts in rice (Oryza sativa L.). BMC Genomics 11:683–695. 19. Swanson-Wagner R, et al. (2012) Reshaping of the maize transcriptome by domestication. Proc Natl Acad Sci USA 109(29):11878–11883. 20. Kohl S, et al. (2012) A putative role for amino acid permeases in sink-source communication of barley tissues uncovered by RNA-seq. BMC Plant Biol 12:154. 21. Hübner S, et al. (2009) Strong correlation of wild barley (Hordeum spontaneum) population structure with temperature and precipitation variation. Mol Ecol 18(7):1523–1536.

PNAS | September 16, 2014 | vol. 111 | no. 37 | 13407

EVOLUTION

respectively, according to the length of unique windows between Cultivar and wild-barley groups (Fig. 2). This finding was not a coincidence, as supported by the results obtained in many recent studies. Some genes related to spike morphology and flowering time, as the main traits of early domestication, have been cloned from chromosome 2H: genes such as Ppd-H1 (39), vrs1 (40), HvAP2 (41, 42), and HvCEN (43). On the other hand, VRN-H3, another early domesticated gene conferring a spring growth habit (44), has been cloned from chromosome 7H, which coincides with the spring-type of Tibetan wild barley (12). Most importantly, tough (nonbrittle) rachis, one of the earliest domesticated traits, encoded by two closely linked complementary genes, had been mapped to chromosome 3H, supporting the concept of polyphyletic domestication of cultivated barley (6, 9). Similarly, our data show that the difference of genomic contribution between two wild-barley populations was smaller in chromosome 3H than other chromosomes (Fig. 2). Accordingly, it may be concluded that the genome of modern cultivated barley originated from two major wild-barley populations, one from the Near East Fertile Crescent and the other from the Tibetan Plateau, with different contributions on each chromosome. Our case study on five cultivated barley accessions also verified this finding. Because of the unavoidable gene flow between early domesticated genotypes and their wild ancestors (45, 46), the domestication process of barley is more complex than that expected previously. The current data cannot exclude the possibility that the Tibetan wild-barley genome was merged into cultivated barley during a second domestication following the first domestication that occurred in the Fertile Crescent. Hence, further evidence is required through a comprehensive genomic investigation of wild and cultivated barley from the Near East Fertile Crescent and Tibetan Plateau.

22. Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12(10):671–682. 23. Matsumoto T, et al. (2011) Comprehensive sequence analysis of 24,783 barley fulllength cDNAs derived from 12 clone libraries. Plant Physiol 156(1):20–28. 24. Rubin CJ, et al. (2010) Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464(7288):587–591. 25. Mascher M, et al. (2013) Barley whole exome capture: A tool for genomic research in the genus Hordeum and beyond. Plant J 76(3):494–505. 26. Nevo E, Zohary D, Brown AHD, Haber M (1979) Genetic diversity and environmental associations of wild barley, Hordeum spontaneum, in Israel. Evolution 33(3):815–833. 27. Nevo E, Apelbaum-Elkaher I, Garty J, Beiles A (1997) Natural selection causes microscale allozyme diversity in wild barley and a lichen at “Evolution Canyon” Mt. Carmel, Israel. Heredity 78(4):373–382. 28. Nevo E, Brown AHD, Zohary D, Storch N, Beiles A (1981) Microgeographic edaphic differentiation in allozyme polymorphisms of wild barley (Hordeum spontaneum, Poaceae). Plant Syst Evol 138(3-4):287–292. 29. Nevo E, Beiles A, Kaplan D, Storch N, Zohary D (1986a) Genetic diversity and environmental associations of wild barley, Hordeum spontaneum (Poaceae), in Iran. Plant Syst Evol 153(3-4):141–164. 30. Nevo E, Zohary D, Beiles A, Kaplan D, Storch N (1986b) Genetic diversity and environmental associations of wild barley, Hordeum spontaneum, in Turkey. Genetica 68(3):203–213. 31. Parnas T (2006) Evidence for incipient sympatric speciation in wild barley, Hordeum spontaneum, at “Evolution Canyon”, Mount Carmel, Israel, based on hybridization and physiological and genetic diversity estimates. Master Thesis, (Univ of Haifa, Haifa, Israel). 32. Purugganan MD, Fuller DQ (2009) The nature of selection during plant domestication. Nature 457(7231):843–848. 33. Riehl S, Zeidi M, Conard NJ (2013) Emergence of agriculture in the foothills of the Zagros Mountains of Iran. Science 341(6141):65–67. 34. Skoglund P, et al. (2012) Origins and genetic legacy of Neolithic farmers and huntergatherers in Europe. Science 336(6080):466–469. 35. Qi X, et al. (2013) Genetic evidence of paleolithic colonization and neolithic expansion of modern humans on the Tibetan plateau. Mol Biol Evol 30(8):1761–1778. 36. Taketa S, et al. (2008) Barley grain with adhering hulls is controlled by an ERF family transcription factor gene regulating a lipid biosynthesis pathway. Proc Natl Acad Sci USA 105(10):4062–4067. 37. Guedes J, et al. (2014) Moving agriculture onto the Tibetan Plateau: The archaeobotanical evidence. Archaeol Anthropol Sci, 10.1007/s12520-013-0153-4.

13408 | www.pnas.org/cgi/doi/10.1073/pnas.1414335111

38. Morrell PL, Lundy KE, Clegg MT (2003) Distinct geographic patterns of genetic diversity are maintained in wild barley (Hordeum vulgare ssp. spontaneum) despite migration. Proc Natl Acad Sci USA 100(19):10812–10817. 39. Turner A, Beales J, Faure S, Dunford RP, Laurie DA (2005) The pseudo-response regulator Ppd-H1 provides adaptation to photoperiod in barley. Science 310(5750): 1031–1034. 40. Komatsuda T, et al. (2007) Six-rowed barley originated from a mutation in a homeodomain-leucine zipper I-class homeobox gene. Proc Natl Acad Sci USA 104(4): 1424–1429. 41. Houston K, et al. (2013) Variation in the interaction between alleles of HvAPETALA2 and microRNA172 determines the density of grains on the barley inflorescence. Proc Natl Acad Sci USA 110(41):16675–16680. 42. Nair SK, et al. (2010) Cleistogamous flowering in barley arises from the suppression of microRNA-guided HvAP2 mRNA cleavage. Proc Natl Acad Sci USA 107(1):490–495. 43. Comadran J, et al. (2012) Natural variation in a homolog of Antirrhinum CENTRORADIALIS contributed to spring growth habit and environmental adaptation in cultivated barley. Nat Genet 44(12):1388–1392. 44. Yan L, et al. (2006) The wheat and barley vernalization gene VRN3 is an orthologue of FT. Proc Natl Acad Sci USA 103(51):19581–19586. 45. Ellstrand NC, Prentice HC, Hancock JF (1999) Gene flow and introgression from domesticated plants into their wild relatives. Annu Rev Ecol Syst 30:539–563. 46. Hübner S, et al. (2012) Islands and streams: clusters and gene flow in wild barley populations from the Levant. Mol Ecol 21(5):1115–1129. 47. Dai F, et al. (2010) Differences in phytase activity and phytic acid content between cultivated and Tibetan annual wild barleys. J Agric Food Chem 58(22):11821–11824. 48. Qiu L, et al. (2011) Evaluation of salinity tolerance and analysis of allelic function of HvHKT1 and HvHKT2 in Tibetan wild barley. Theor Appl Genet 122(4):695–703. 49. Tamura K, et al. (2011) MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28(10):2731–2739. 50. Tang H, Peng J, Wang P, Risch NJ (2005) Estimation of individual admixture: Analytical and study design considerations. Genet Epidemiol 28(4):289–301. 51. Price AL, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909. 52. Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89(3):583–590. 53. Krzywinski M, et al. (2009) Circos: An information aesthetic for comparative genomics. Genome Res 19(9):1639–1645.

Dai et al.

Transcriptome profiling reveals mosaic genomic origins of modern cultivated barley.

The domestication of cultivated barley has been used as a model system for studying the origins and early spread of agrarian culture. Our previous res...
1MB Sizes 0 Downloads 5 Views