Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

ARTICLE Contrasting X-Linked and Autosomal Diversity across 14 Human Populations Leonardo Arbiza,1,2 Srikanth Gottipati,1,2 Adam Siepel,1 and Alon Keinan1,* Contrasting the genetic diversity of the human X chromosome (X) and autosomes has facilitated understanding historical differences between males and females and the influence of natural selection. Previous studies based on smaller data sets have left questions regarding how empirical patterns extend to additional populations and which forces can explain them. Here, we address these questions by analyzing the ratio of X-to-autosomal (X/A) nucleotide diversity with the complete genomes of 569 females from 14 populations. Results show that X/A diversity is similar within each continental group but notably lower in European (EUR) and East Asian (ASN) populations than in African (AFR) populations. X/A diversity increases in all populations with increasing distance from genes, highlighting the stronger impact of diversity-reducing selection on X than on the autosomes. However, relative X/A diversity (between two populations) is invariant with distance from genes, suggesting that selection does not drive the relative reduction in X/A diversity in non-Africans (0.842 5 0.012 for EUR-to-AFR and 0.820 5 0.032 for ASN-to-AFR comparisons). Finally, an array of models with varying population bottlenecks, expansions, and migration from the latest studies of human demographic history account for about half of the observed reduction in relative X/A diversity from the expected value of 1. They predict values between 0.91 and 0.94 for EUR-to-AFR comparisons and between 0.91 and 0.92 for ASN-to-AFR comparisons. Further reductions can be predicted by more extreme demographic events in excess of those captured by the latest studies but, in the absence of these, also by historical sex-biased demographic events or other processes.

Introduction Comparisons of the genetic diversity of the human X chromosome (X) and autosomes have received ample recent attention.1–18 Such comparisons can facilitate understanding historical differences between males and females and the forces of natural selection. In a population that has been of constant size throughout history and has had identical distribution of offspring numbers between males and females in every generation, the effective population size (Ne) of X—and hence its nucleotide diversity estimated on the basis of the discovery of SNPs from sequencing—is expected to be three-quarters of that of the autosomes. However, deviations from this ratio are expected by at least four forces: (1) differential mutation rates between X and the autosomes, due at least in part to different mutation rates between males and females15,19,20 (we controlled for this in our analyses by normalizing human nucleotide diversity by genetic divergence between humans and an outgroup species1,2,6–8,12); (2) sex-biased demographic events or social practices leading to different Ne of males and females;2,4,7,16,21–24 (3) changes in historical Ne over time, even in the face of equal male and female Ne, given that the larger Ne of the autosomes implies deeper genealogies and therefore a greater impact of more ancient evolutionary history;2,25 and (4) forces of natural selection differentially affecting X and the autosomes, mostly, but not only, because of the exposure of otherwise recessive X-linked variants in hemizygous males.11,16,17,26–30

The ratio of X-to-autosomal (X/A) nucleotide diversity, when diversity estimates are normalized by divergence for controlling for deviations due to (1) above, can serve as a proxy for the ratio of effective population sizes between X and the autosomes (Nx / NA). Recent studies by Hammer et al. have observed X/A diversity in human populations to be between 0.8 and 1.08 in intergenic regions and 1.11 in regions very far from genes.7,8 A different set of studies estimated genome-wide X/A diversity to be 0.61 and 0.73 in intergenic regions in northern Europeans and West Africans, respectively, and 0.75 and 0.88, respectively, in regions very far from genes.1,2 Both sets of studies have observed a clear positive correlation between X/A diversity and genetic distance from the nearest protein-coding gene, which suggests that diversityreducing selection has left a more pronounced signature on X.1,8 Significant differences in X/A diversity between northern Europeans and West Africans were observed with no support for a correlation between this difference and genetic distance from the nearest gene.1,2 Hence, these population-specific differences have been suggested to be due to demographic rather than selective forces.1,2 Similar conclusions of a reduction in Nx / NA in Europeans relative to West Africans have been drawn by various groups from analyses of other patterns of genetic variation, including interpopulation allele frequency differences and the distribution of allele frequencies within populations.2,17,18,31 Several questions remain after these studies of relative diversity on X and the autosomes. Perhaps the most

1

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA These authors contributed equally to this work *Correspondence: [email protected] http://dx.doi.org/10.1016/j.ajhg.2014.04.011. Ó2014 The Authors 2

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

The American Journal of Human Genetics 94, 1–18, June 5, 2014 1

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

notable question is what the patterns are in other human populations around the globe. Sequencing-based studies were based on either six individuals of European descent8 or 69 females from only two populations of northwestern European (CEU [Utah residents with ancestry from northern and western Europe from the CEPH collection]) and West African (YRI [Yoruba in Ibadan, Nigeria]) ancestry.1 The recent availability of population-scale whole-genome sequencing from additional ancestries facilitates testing for differences in X/A diversity between populations, both inter- and intracontinentally, and between them and both CEU and YRI populations. Specifically, we sought to test whether empirical observations based on CEU and YRI populations are due more generally to differences between African and non-African populations. Another question of crucial importance concerns the range of explanations consistent with differences in X/A diversity between different populations. Each of the three aforementioned forces that affect normalized X/A diversity (2–4), including population-specific historical patterns of natural selection, the changing of Ne over time due to population bottlenecks, expansion, and migration, and sexbiased demographic events, can potentially play a part in explaining the observations. The larger sample of genome-wide sequencing, including additional populations, improves our ability to disentangle these different forces by increasing the resolution of comparisons between empirical estimates and between them and theoretical predictions from models of human demographic history. To address these open questions, in this study we analyzed population-scale whole-genome sequencing data from the 1000 Genomes Project.32 These data consist of 14 populations spanning Europe, East Asia, sub-Saharan Africa, and the Americas for a total of 569 whole genomes from females (out of 1,092 individuals in the data overall). On the basis of these, we revisited—with higher resolution and larger geographic range—the comparisons between X and the autosomes and the study of diversity as a function of distance from genes. Similarly to the previous studies,1,8 we focused on population-level measures of nucleotide diversity. For all populations, as well as on a continental level, we estimated nucleotide diversity on each of X and the autosomes, in addition to their ratio (absolute X/A diversity), which follows normalization of each by genetic divergence from an outgroup. We further estimated relative X/A diversity as the absolute X/A diversity between each pair of populations and each pair of ancestry-based continental groups. This relative estimate is insensitive to differences between X and the autosomes in mutation rates, SNP calling, and ascertainment biases.1 Finally, to disentangle the effects of natural selection from those of demography, we considered the behavior of all these quantities as a function of genetic distance from genes. Overall, our results show that estimates of X/A diversity of non-African populations are lower than those of African

2

The American Journal of Human Genetics 94, 1–18, June 5, 2014

populations. The impact of selection on patterns of variation around genes was more pronounced on X than on the autosomes, although it was generally proportional between X and the autosomes in different populations, suggesting that factors other than local impact of selection need to be invoked to explain differences in X/A diversity between populations. Comparing empirical estimates in African and non-African populations with the reductions that can be expected from historical changes in population size according to a variety of recent models of human demography50,56–61 shows that these cannot fully explain the reduction in estimates of X/A diversity out of Africa. Alternative factors that can explain or play a part in explaining the observations include the previously modeled male-dominated migrations during the Out-of-Africa event,2,4 strong demographic events in excess of what is captured by studies of human demography, sex-biased changes in generation time in nonAfrican populations,33 or social-structure changes leading to variation in reproductive success between males and females.2,4,21,33

Material and Methods Population-Scale Sequence Data and SNP Calling In order to characterize and contrast patterns of diversity between X and the autosomes in a wide sample of global populations, we turned to the rich catalog of SNP and genotype calls released by phase 1 of the 1000 Genomes Project (see Web Resources).32 This data set consists of whole-genome sequencing of 14 individuals from worldwide populations from Europe, East Asia, subSaharan Africa, and the Americas for a combined sample size of 1,092 unrelated individuals. We used the consensus set of SNP calls merged from the independent calling from each of five centers (Broad Institute, Baylor College of Medicine, the Sanger Institute, the University of Michigan, and the NCBI).32 SNP calls on X included sites called by the University of Michigan, which modeled females as diploids and males as haploids.32 To minimize ascertainment biases, we controlled for the different coverage and different chromosomal sample size between the autosomes and X by performing all analyses only on the subset of 569 females from across the 14 populations (Table S1, available online).

Genomic Regions and Filters To obtain the set of SNPs and regions used in the analysis, we applied several genomic and data-quality filters to minimize sequencing, alignment, variant inference, and genomic assembly errors, as well as to exclude candidate regions under the direct influence of selective constraints. For filtering, we combined several tracks from build hg19 of the UCSC Genome Browser;34 these included repeats according to the Tandem Repeats Finder35 (simple-repeats track), regions with high CpG content (CpG-island track36), centromeres and telomeres along with 2 Mb flanking regions, gaps in the human assembly, mammalian conserved noncoding elements (PhastConsElements44WayPlacental track37), pseudoautosomal regions on X, segmental duplications,38 and regions corresponding to gene transcripts defined by a union of UCSC,39 RefSeq,40 and GenCode41 gene tracks. Finally, we considered only regions with conserved synteny (syntenic-net tracks42)

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

and excluded gaps in human-outgroup alignments (derived from the 44-way multiz-alignment track43,44) for rhesus macaque45 and orangutan46 (rheMac2 and ponAbe2 assemblies). In addition to applying these genome-wide filters, we also selected individual SNPs to exclude low-quality bases as determined by the strict ‘‘callability mask’’ released by the 1000 Genomes Project.32 After applying this strict set of genome-wide filters, we kept a total of 27.6 and 470.7 Mb of sequence for further analysis on X and the autosomes, respectively. Although this represents a genomic fraction similar to that considered in our previous study1 with data from the pilot phase of the 1000 Genomes Project, the total number of SNPs passing quality control was 270,592 on X and 6,737,693 on the autosomes, providing a richer catalog than previously available1 while allowing for more stringent quality controls.

Estimates of Nucleotide Diversity We used the final filtered set of SNPs and genomic regions from X and the autosomes to build a collection of 100 kb loci defined by grouping bases along chromosomes that did not meet any of the above filtering criteria, i.e., without restricting bases to being contiguous. For each of these loci, and each population, we estimated nucleotide diversity (p; heterozygosity) as the average number of pairwise differences per base. Similarly, we estimated diversity for ancestry-based continental groups (see Table S1) by first pooling individuals from each continental group and then estimating p as above. To account for both local and larger-scale differences in mutation rates between X and the autosomes, we estimated human-outgroup divergence (D) for each 100 kb locus as the fraction of differences between the human reference and either orangutan46 or macaque45 and used a Jukes-Cantor correction for recurrent mutations.47 We then used divergence-normalized estimates (p/D) for each of X and the autosomes to compute the average X/A diversity for individual populations, henceforth referred to as absolute X/A diversity. We also calculated the ratio of these estimates of absolute X/A diversity between pairs of populations, henceforth called relative X/A diversity. Because relative X/A diversity is a ratio of two estimates for the exact same set of loci, D cancels out between the numerator and the denominator. We obtained SEMs by a moving block bootstrap procedure,48,49 where we generated 1,000 random data sets by resampling from the full set of 100 kb loci.50 This resampling approach allows for the SEM to account for the structure of correlation among nearby regions due to linkage disequilibrium (LD). We used the same bootstrapping procedure to obtain SEs for each of the estimates of nucleotide diversity in X and the autosomes, absolute X/A diversity, and relative X/A diversity.

Impact of Ancestral Polymorphism on Absolute X/A Diversity To account for differences in mutation rate, comparisons of diversity between X and the autosomes have relied on normalization of each by divergence from an outgroup1,2,8 as described above. Beyond this correction, we considered here the effect that ancestral polymorphism (polymorphic sites in the population ancestral to humans and the outgroup) can have on the correction by divergence and how it sheds light on differences in estimates between previous studies that used different outgroups.12 The different Ne of X and the autosomes leads to differential levels of ancestral polymorphisms—as a result of shorter coalescent time of X in

the ancestral population—and therefore a bias in absolute X/A diversity when ancestral polymorphism is unaccounted for. Hence, we corrected for the expected magnitude of this effect when considering either human-orangutan or human-macaque divergence. Specifically, absolute X/A diversity can be written as (pX / DX) / (pA / DA), for which p and D are defined above. Standard theory entails that E(p) ¼ 4Nem and E(D) ¼ 2m (t þ 2Nanc), where m is the mutation rate per base pair per generation (here assumed to be constant over time), t is the number of generations since speciation (assumed to be equal for both X and the autosomes), and Nanc is the effective population in the ancestral population. These entail that   t 1þ  2NAanc : absolute X A diversity ¼  t 1þ 2NXanc Even in the case where NX / NA ¼ NXanc / NAanc ¼ 0.75, the estimator of absolute X/A diversity deviates from the expectation of 0.75 because of the contribution of ancestral polymorphism depending on the relative magnitudes of t and Nanc.51,52 Several recent studies have estimated t and Nanc for different branches of the primate phylogeny,46,51,53 although it has proven challenging to estimate Nanc in the deeper branches. Using estimates from the best-fitting models of Burgess and Yang,51 we computed the overestimation of X/A diversity when considering human-orangutan and human-macaque divergence. For the orangutan, t and Nanc were estimated at 14.6–22 million years ago (Ma) and 84,000–126,000, respectively.51 Under the assumption of 20 years/generation, these would both produce estimates of absolute X/A diversity of 0.787, corresponding to an inflation of 4.9% above 0.75. Similarly, for macaque (t ¼ 29.2 Ma and Nanc ¼ 83,333),51 the respective inflation is 2.1%. We incorporated the additional correction of absolute X/A diversity by 4.9% and 2.1% throughout. Importantly, ancestral polymorphism does not affect relative X/A diversity given that it is identical irrespective of the human populations compared.

Variation in Estimates with Genetic Distance from the Nearest Gene In order to explore how the impact of natural selection on linked sites affects estimates of nucleotide diversity, absolute X/A diversity, and relative X/A diversity, we evaluated each as a function of genetic distance from the nearest protein-coding gene. We considered genes as the union of all transcripts from the RefSeq genes set40 and translated physical distances from these to genetic distances by using the sex-averaged recombination rates released by HapMap54,55 while scaling estimates for X by two-thirds to translate from female-only to sex-averaged rates (see Web Resources). We assigned all bases according to their genetic distance from the nearest gene to each of seven bins spanning distances between 0 and 0.4 cM in such a manner that an equal fraction of bases in X fit each bin. We also used the same seven bins as dictated by X to partition sites on the autosomes (Figure S1). We repeated the construction of a collection of 100 kb loci separately for sites within each bin and used these to estimate nucleotide diversity, absolute X/A diversity, and relative X/A diversity in each bin according to the same procedures described above. We tested for a linear correlation between these estimates and the mean genetic distance of bases in a bin by

The American Journal of Human Genetics 94, 1–18, June 5, 2014 3

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

obtaining the Pearson correlation coefficient in each of 1,000 moving block bootstrap samples and performing a two-tailed ztest based on their mean and SD.

Simulations and Analytical Approximation for Models of Human Demography Even in the absence of sex-biased processes or differential selection on X and the autosomes, simple demographic events such as population contractions, expansions, and bottlenecks can lead to departures from the expectation of 0.75 for NX / NA.2,25 Hence, we considered several recently published models of human demographic history—estimated for the autosomes—of West African, European, and East Asian populations50,56–61—and explored their effect on estimates of both absolute and relative X/A diversity when no sex-biased processes were considered. Specifically, for most models, we could address this effect by using an analytical solution that accounts for the contribution of each of a number of epochs to the time to the most recent common ancestor (tMRCA) in each compartment:62 0 EðtMRCAÞ ¼ 2@N1 þ

2

M 1 X

4ðNmþ1  Nm Þe



31 m P Tl 2Nl 5A: l¼1

m¼1

Here, Ni is the effective population size in epoch i, and Ti is its duration in generations. Given that all demographic models we considered have been estimated for the autosomes, for X we multiplied all Ni values by 0.75. For phases of exponential growth, we used single generation ‘‘epochs’’ as an approximation. To verify this approximation, as well as to allow models with migration between populations, we additionally ran coalescent simulations by using the program ms with parameters exactly as published for each demographic model.63 Each simulation was run for 100 chromosomes, 100,000 replicates, and a 10,000 bp locus (see Table S2 for additional details). We repeated both analytical approximation and coalescent simulations for X and the autosomes and derived their predicted effect on estimates of absolute X/A diversity within populations and of relative X/A diversity between populations (each was dependent on the demographic history, parameters, and migration rates for all populations considered in each model).

Results Genome-wide Levels of Autosomal, X-Linked, and X/A Diversity To contrast levels of diversity between X and the autosomes, we estimated nucleotide diversity (p) normalized by divergence from rhesus macaque for X and the autosomes separately after applying a stringent array of filters to the data. After normalization, we computed the levels of X/A nucleotide diversity with an additional correction for polymorphisms in the ancestral population to humans and macaques (Material and Methods). Figures 1 and 2 represent the resulting estimates of normalized diversity and X/A diversity for each of the 14 populations and across all individuals in each of the four ancestry-based continental groups, respectively, as defined by the 1000 Genomes Project.32 Similar results for normalization with orangutan, as well as without normalization by divergence from an outgroup, are presented in Figures S2–S5.

4

The American Journal of Human Genetics 94, 1–18, June 5, 2014

Genome-wide diversity estimates are in line with those from recent studies and point to higher levels of autosomal diversity in African (p z 0.1% before normalization; Figure S2) than in non-African (p z 0.08%) populations.32,50,62,64–66 As expected, normalized diversity was lower on X than on the autosomes for all populations (Figure 1A; Figure S3A). X/A diversity was similar for all three populations from the African (AFR) continental group: all three estimates were in the range 0.76–0.77 (Figure 1A). It was 0.64–0.65 for all five populations of European (EUR) ancestry and 0.60–0.62 for the three East Asian (ASN) populations. These estimates of X/A diversity are similar to our previous CEU and YRI results based on pilot data from the 1000 Genomes Project,64 although they are consistently about 4% higher for both populations (0.762 versus 0.734 for YRI and 0.644 versus 0.615 for CEU).1 This difference between phase 1 data and the pilot data from the 1000 Genomes Project stems largely from the higher X diversity estimates (Figure S2), most likely due to improvements in calling for that part of the genome.1,32,64 In a paper predating the 1000 Genomes data, Keinan et al. estimated nucleotide diversity and X/A diversity by calling variants across only two haploid sequences at a time, thereby applying the same ascertainment on X and the autosomes.2 Estimates from that paper for West Africans (0.763), northern Europeans (0.635), and East Asians (0.613) are practically identical (even though they are based on orders of magnitude fewer data) to our corresponding estimates for populations from all three continental groups, supporting the idea that improved calling explains the ~4% discrepancy between the phase 1 and pilot data. X/A diversity varied more among the AMR continental group, which consists of admixed populations from the Americas (Figure 1A); estimates were 0.71 for Puerto Rican (PUR), 0.66 for Colombian (CLM), and 0.64 for Mexican American (MXL) populations. The differences between these populations are in line with their exhibiting different admixture proportions of ancestry from African, European, and Native American populations.65,67 Furthermore, these admixture proportions have been shown to be sex biased, which would differentially affect diversity on X and the autosomes.67,68 For these reasons, we do not discuss results for AMR populations for the following analyses. Increased Autosomal, X-Linked, and X/A Diversity with Increasing Distance from Genes To examine the effect of natural selection on these quantities, we considered genetic distance from the nearest gene as a proxy for decreased effect of diversity-reducing selection on linked sites. We partitioned the data according to ranges of distance from the nearest gene such that each partition encompassed an equal fraction of X (Figure S1). For all 14 populations, both autosomal and X-linked diversity increased with distance from genes (Figure 3A). Across the different populations, linear

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

Figure 1. Estimates of Autosomal, X-Linked, and Absolute X/A Diversity For each of the 14 populations from 1000 Genomes, (A) genome-wide estimates are shown for nucleotide diversity (p) on the autosomes (left-most), nucleotide diversity on X (middle), and the ratio of the two (X/A diversity, right-most). All estimates follow a series of data filters, including the exclusion of genes, and are normalized by genetic divergence from rhesus macaque, thereby controlling for differential mutation rates on X and the autosomes (Material and Methods). Populations are sorted in decreasing order of autosomal diversity. This order is followed throughout the paper and corresponds to a grouping by ancestry-based continental groups: AFR, African; AMR; American; EUR, European; EAS, East Asian. Population abbreviations are listed in Table S1. (B) Estimates of nucleotide diversity in the last partition, furthest from genes (at least 0.2 cM from the nearest gene; see also Figure 3), after partition of the genome by distance from the nearest gene. These loci were affected to a lesser extent by diversity-reducing selection at linked sites and hence showed higher levels of both autosomal and X-linked diversity in all populations. The horizontal dotted line in the right-most panel denotes the expectation of X/A diversity ¼ 0.75 under neutrality and several additional assumptions (see main text). Error bars denote 51 SEM, computed with a block bootstrap approach (Material and Methods). We obtained similar results when instead using divergence from orangutan for normalization (Figure S3) and when considering only levels of human nucleotide diversity without normalization (Figure S2).

correlation coefficients between autosomal diversity and mean genetic distance of each partition were between R ¼ 0.55 and R ¼ 0.60 and highly significant (p < 1012). With a coefficient between R ¼ 0.86 and R ¼ 0.91 (p < 1012 for all), the equivalent for X was even more strongly correlated. We further pooled individuals by continental groups and observed similar results: R ¼ 0.57–0.59 for the autosomes and 0.86–0.91 for X (p < 1012 for all; Figure 4A). These results are in line with our previous results based on the 1000 Genomes pilot data,1 where we showed that the increased diversity with distance from genes closely matches predictions of the model of McVicker et al.,69 and are thereby consistent with the diversity-reducing effect of selection on linked sites

through purifying selection (background selection), positive selection (genetic hitchhiking), or both. Increased diversity with distance from genes is to be expected plainly as a result of increased levels of selection on genes. Next, we aimed to test whether this effect of selection shapes diversity to a different extent on X and the autosomes. Hence, we estimated X/A diversity separately in each partition of distance from genes (Figures 3B and 4B) and observed a very strong correlation (ranging between R ¼ 0.90 and R ¼ 0.94; p < 1012 for all) between the two in all populations and continents. This result entails a greater increase in diversity with genetic distance from genes for X than for the autosomes, suggesting that diversity reduction due to selection at linked sites has

The American Journal of Human Genetics 94, 1–18, June 5, 2014 5

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

Figure 2. Estimates of Autosomal, X-Linked, and Absolute X/A Diversity by Continental Group The figure mirrors Figure 1, except that we merged all individuals from each of the four ancestry-based continental groups to obtain diversity estimates for each group. Figure S4 presents results without any normalization, and Figure S5 presents similar results with normalization by divergence from orangutan instead of macaque.

been a more powerful force on X. The general trends in overall diversity with distance to genes are evident both before and after normalization by divergence from either macaque or orangutan (Figures 3B, 4B, and S6–S9). These results show a trend similar to Hammer et al.’s recent results based on individuals of European ancestry8 and are quantitatively similar to our previous observations.1 Autosomal, X-Linked, and X/A Diversity in Relatively Neutral Regions The correlation with distance from genes of autosomal, X-linked, and X/A diversity entails that the genomewide results (Figures 1A and 2A) serve merely as averages across different levels of natural selection. We next revisited these quantities in relatively neutral regions by considering only estimates in the last bin (furthest from the nearest gene), encompassing a total of ~4 Mb of X-linked and ~67 Mb of autosomal sequences that are at least 0.2 cM away from the nearest gene. Although we do not expect these loci to be completely neutral—because of both selection on nonannotated and noncoding elements and remaining weak LD with genes—estimates based on these loci were much less

6

The American Journal of Human Genetics 94, 1–18, June 5, 2014

affected by selection on genes than were genome-wide estimates. Importantly, these loci—as all regions we analyzed—excluded conserved noncoding elements, noncoding genes, and regions extending over an inclusive definition of coding transcripts (Material and Methods). Figures 1B and 2B present diversity estimates in these loci and consist of a summary of the last partition from Figures 3 and 4, respectively. Similar to the genome-wide results, diversity levels in these loci were higher in African populations (p z 1.1% before normalization; Figure S2) than in non-African populations (p z 0.8%–0.9%), and normalized diversity on X was lower for all populations and continents (Figures 1B and 2B). For the autosomes, diversity levels in these relatively neutral loci were 4.6%–5.5% higher than genome-wide estimates; the higher end of this range was observed for non-African populations (Figure 1). On X, diversity levels in these relatively neutral loci were 19%–29% higher than X-wide estimates; the lower end of this range was observed for African populations (Figure 1). Based on these X-linked and autosomal relatively neutral regions alone, estimates of X/A diversity were

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

Figure 3. Autosomal, X-Linked, and Absolute X/A Diversity Increase with Genetic Distance from the Nearest Gene (A) Estimates of nucleotide diversity normalized by divergence to macaque on the autosomes and X are presented for partitions of the genome by distance from the nearest gene, as in Figure 1. (B) The absolute X/A diversity, as the ratio of estimates from (A), is presented separately for each partition. The horizontal dotted line denotes the expectation of X/A diversity ¼ 0.75 under neutrality and several additional assumptions (see main text). Error bars denote 51 SEM, computed with a moving block bootstrap procedure (Material and Methods). The population code is indicated above each panel (panel columns correspond to continental groups and are ordered as in Figure 1). The labels on the x axis represent the boundaries between partitions, which were selected such that each partition encompassed an equal fraction of X (Figure S1). For comparison across populations, values from the last bin are reproduced in Figure 1B. We obtained similar results when instead using divergence from orangutan (Figure S6) and when considering only levels of human nucleotide diversity without normalization (Figure S7).

13%–22% higher than their genome-wide equivalents for all populations and continents (Figures 1 and 2). Similar to genome-wide estimates, estimates of X/A diversity in these regions were highest in African populations and higher in Europeans than in East Asians. They were similar for all populations from the AFR continental group: all three estimates were in the range 0.86–0.88 (versus 0.76–0.77 genome-wide). Estimates were 0.78–0.79 for all five EUR populations (versus 0.64–0.65 genome-wide) and 0.70–0.75 for all three ASN populations (versus 0.60–0.62 genome-wide). Under the assumption of neutrality in these relatively neutral regions, as well as historical sex balance and constant population size, these normalized estimates of X/A diversity are expected to be 0.75, as described in the Introduction. The above values in AFR populations, as judged by their SEs (Figures 1B and 2B), are significantly higher than this expectation. For EUR (as well as AMR), they are marginally above the 0.75 expectation and are nonsignificantly below expectation for the ASN populations. However, there is no reason to think that the outlined assumptions are met, and in the following section, we examine several factors that affect the comparison between these estimates and the oversimplified 0.75 expectation.

Improved Normalization by Divergence from an Outgroup Estimates of absolute X/A diversity (as opposed to relative X/A diversity, described in the next section) are bound to be sensitive to many details of the methods used to obtain them because many factors can have differential impacts on X-linked and autosomal estimates. Perhaps the most important and the most problematic is the normalization by divergence from an outgroup; this factor is crucial for accounting for differences in mutation rates between X and the autosomes.1,12 The choice of outgroup and the details of normalization by its genetic divergence from humans have been highlighted as a potential explanation for previous studies reporting different estimates of X/A diversity.12,14,15 In order to consider deviations of absolute X/A diversity from the expectation, especially in relatively neutral regions, and the effects of demographic processes, we based the results described above on a normalization more accurate than that in our previous work.1,2 This improved normalization accounts for the differential effect on X and autosomal divergence estimates of polymorphisms in the population ancestral to humans and the outgroup (Material and Methods). To further investigate the effect of normalization by divergence on our results, as well as the improvement

The American Journal of Human Genetics 94, 1–18, June 5, 2014 7

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

Figure 4. Increase in Absolute Diversity with Increased Genetic Distance by Continental Group The figure mirrors Figure 3, except that we merged all individuals from each of the four continental groups to obtain diversity estimates in each partition. Figures S8 and S9 present similar results with normalization by orangutan divergence and without normalization by divergence, respectively.

approach used in previous studies1,2 without the additional correction for ancestral polymorphisms, X/A diversity in relatively neutral regions was on average 4.8% higher when we normalized by divergence from orangutan in these regions than when we used the analogous estimates with macaque normalization (Figure S10A). After the additional correction for ancestral polymorphism, X/A diversity in these regions was only ~2% higher for the orangutan than for the macaque normalization (Figure S10B). These results entail that ancestral polymorphisms account for most of the difference in X/A diversity with different outgroups, which explains small discrepancies between previously published estimates.1,2,6–8 The remaining 2% difference can be due to uncertainty in the estimation of ancestral effective population sizes used in the correction (Material and Methods) and differential changes that occurred in recent hominid evolution between autosomal and X-linked mutation rates, e.g., as a result of a relative acceleration of the male mutation rate15 (see Discussion).

from correcting for ancestral polymorphisms, we compared estimates based on each of two outgroups: orangutan and rhesus macaque. When we reproduced the

8

The American Journal of Human Genetics 94, 1–18, June 5, 2014

Relative Autosomal, X-Linked, and X/A Diversity between Populations We introduced another approach to overcome any issues due to differences in mutation rates between X and the autosomes, as well as additional potential problems such as differences in SNP ascertainment biases between the two. The idea is that factors that are identical for all populations, as is the normalization by divergence, do not affect the comparison of estimates between populations. Hence, we compared estimates of X/A diversity between each pair of populations by dividing the estimate in one population by that of another to obtain relative X/A diversity. In this ratio, both the numerator and denominator are normalized by the same divergence factor and hence cancel each other out. For completeness, we similarly obtained relative diversity on X and the autosomes

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

Figure 5. The CEU/YRI Ratio of Autosomal, X-Linked, and X/A Diversity for Bins of Genetic Distance from the Nearest Gene The relative estimates reflect the ratio of normalized diversity in CEU to that in YRI in each bin of distance from the nearest gene for (A) X and the autosomes separately and (B) X/A diversity in each population (relative X/A diversity); the genome-wide average across all loci is indicated by the dotted line. Error bars denote 5 1 SEM, computed with a moving block bootstrap procedure (Material and Methods). Estimates of relative X/A diversity for additional population pairs are summarized in Tables S5 and S6, and results by continental groups are presented in Figure 6.

separately between populations. We estimated the relative diversity of X-linked, autosomal, and X/A nucleotide diversity by using the less diverse population as the numerator. Considering relative estimates between populations also allows the study of a circumscribed time period since the two populations split. This approach has proved successful for studying differences in X/A diversity in Drosophila both between populations (African and non-African) and between species (D. melanogaster and D. simulans).28,29,70–74 By further considering distance from genes, we could determine whether observed differences between populations are due to differential selective forces or differences in demographic history.

We started by considering a pair of populations, one of West African ancestry (YRI) and one of northwestern European ancestry (CEU), which are the two populations we analyzed previously on the basis of the pilot data from the 1000 Genomes Project.1 As shown above, X-linked and autosomal diversity in both CEU and YRI populations correlated significantly with distance from genes (Figure 3A; p < 1012). Figure 5A presents the relative X-linked and autosomal diversity between CEU and YRI populations as a function of the same partition. It shows a less clear and nonsignificant relationship with distance from genes: the correlation coefficient with mean genetic distance of each partition was R ¼ 0.36 (p ¼ 0.43; Table 1) for the autosomes and R ¼ 0.60 for X (p ¼ 0.15; Table 1). Similar results held for relative estimates between other non-African and African population pairs (Table 1) and for continental pairs (Figure 6; Table S3). The only exception outside comparisons involving admixed populations is that relative X-linked diversity between Iberian and African samples showed modest evidence of a positive correlation with distance to genes, but only prior to accounting for multiple testing (0.78 % R % 0.79, 0.03 % p < 0.05). Importantly, relative X-linked diversity between non-Africans and Africans was consistently smaller than that of autosomal loci across all distances from genes. We next examined estimates of relative X/A diversity between populations. Figure 5B shows both relative X/A diversity for each bin of distances to the nearest gene and genome-wide relative X/A diversity between CEU and YRI populations. The genome-wide relative X/A diversity estimate was 0.848 5 0.027, closely matching our previous estimate of 0.84.1 Although this estimate showed some variation across bins (between 81% 5 2.7% for the second bin and 90% 5 2.9% for the last bin), little evidence of a trend was observed (R ¼ 0.59, p ¼ 0.16; Table 2). These results demonstrate that X/A diversity is lower in CEU than in YRI across all bins. Similar results were observed when we considered the ratio of each of the other European populations to each of the other African populations. Genome-wide relative X/A diversity for these were in the range of 0.835 5 0.027 to 0.852 5 0.027, whereas the continental estimate was 0.842 5 0.013 (Figure 6; Table S4). In general, estimates of relative X/A diversity were devoid of significant evidence of a correlation with distance to genes (Table 2). Respective estimates for relatively neutral loci in the bin furthest from genes were in close agreement with the above estimates, as expected by the lack of correlation with distance from genes, but were consistently ~5% higher than the genome-wide estimates for both the European-to-African ratio and other population comparisons (Tables S5 and S6). This difference should be interpreted with caution given that it was not observed as part of a monotonic increase across distances from genes. Furthermore, because all comparisons between the same bins in

The American Journal of Human Genetics 94, 1–18, June 5, 2014 9

Statistics for the Correlation between Relative Autosomal and X-Linked Diversity and Distance to Genes

LWK

YRI

LWK – YRI

ASW

0.15 (0.74) 0.21 (0.65) –

PUR

0.1 (0.82) 0.01 (0.98)

ASW 0.49 (0.26) 0.58 (0.17) –

CLM

MXL

IBS

TSI

GBR

CEU

FIN

JPT

CHB

CHS

0.46 (0.3)

0.43 (0.33)

0.26 (0.58) 0.78 (0.04)*

0.69 (0.09)

0.68 (0.09)

0.62 (0.14)

0.64 (0.12)

0.56 (0.19)

0.73 (0.06)

0.64 (0.12)

0.44 (0.32)

0.4 (0.38)

0.21 (0.65) 0.79 (0.03)*

0.68 (0.1)

0.68 (0.09)

0.6 (0.15)

0.64 (0.12)

0.58 (0.18)

0.73 (0.06)

0.61 (0.14)

0.47 (0.29)

0.25 (0.58) 0.78 (0.04)*

0.71 (0.07)

0.71 (0.07)

0.51 (0.24)

0.63 (0.13)

0.68 (0.09)

0.67 (0.1)

0.81 (0.03)*

0.72 (0.07)

0.8 (0.03)*

0.8 (0.03)*

0.66 (0.11)

0.79 (0.03)*

0.33 (0.47)

0.71 (0.07)

0.46 (0.3)

0.5 (0.26) 0.68 (0.09)

0.81 (0.03)*

0.83 (0.02)*

0.71 (0.07)

0.76 (0.05)*

0.2 (0.67)

0.7 (0.08)

0.47 (0.29)

0.68 (0.09)

0.8 (0.03)*

0.79 (0.04)*

0.75 (0.05)

0.71 (0.07)

0.34 (0.46)

0.76 (0.05)*

0.69 (0.08)

PUR

0.08 (0.86)

0.04 (0.93)

0.28 (0.54) –

0.16 (0.73) 0.17 (0.71) 0.77 (0.04)*

CLM

0.23 (0.63)

0.19 (0.68)

0.47 (0.28)

MXL

0.06 (0.9)

0.03 (0.96)

0.23 (0.62) 0.04 (0.92)

IBS

0.19 (0.69)

0.15 (0.75)

0.33 (0.47)

0.29 (0.53)

0.02 (0.96)

0.23 (0.63) –

0.36 (0.43)

0.2 (0.67)

0.46 (0.29) 0.31 (0.5)

0.42 (0.35) 0.18 (0.71)

0.38 (0.41)

TSI

0.26 (0.57)

0.22 (0.63)

0.42 (0.35)

0.68 (0.1)

0.16 (0.73)

0.44 (0.33) 0.21 (0.65)



0.48 (0.28)

0.58 (0.18)

0.29 (0.53)

0.08 (0.86)

0.26 (0.58)

0.71 (0.08) 0.47 (0.29)

0.38 (0.4)

0.08 (0.87)

0.36 (0.42)

0.36 (0.43)



0.19 (0.69)

0.19 (0.69)

0.13 (0.77)

0.32 (0.49)

0.06 (0.9)

0.25 (0.59)



0.67 (0.1)

0.17 (0.71)

0.39 (0.38)

– 0.36 (0.43) –

0.46 (0.3)



0.05 (0.91)

GBR

0.36 (0.43)

0.32 (0.49)

0.51 (0.24)

0.77 (0.04)*

0.31 (0.51)

0.53 (0.22) 0.36 (0.43)

CEU

0.39 (0.39)

0.36 (0.43)

0.53 (0.22)

0.83 (0.02)*

0.44 (0.33)

0.7 (0.08) 0.63 (0.13)

FIN

0.31 (0.5)

0.28 (0.55)

0.48 (0.28)

0.87 (0.01)*

0.33 (0.47)

0.69 (0.09) 0.24 (0.61)

0.23 (0.62)

0.13 (0.78)

JPT

0.48 (0.28)

0.46 (0.3)

0.55 (0.2)

0.63 (0.13)

0.46 (0.29)

0.69 (0.09) 0.55 (0.2)

0.5 (0.25)

0.42 (0.34)

0.39 (0.39)

0.47 (0.28)

CHB

0.46 (0.3)

0.44 (0.32)

0.54 (0.21)

0.61 (0.15)

0.43 (0.33)

0.67 (0.1)

0.5 (0.25)

0.47 (0.29)

0.37 (0.41)

0.31 (0.49)

0.44 (0.33)

0.3 (0.51) –

CHS

0.53 (0.22)

0.52 (0.24)

0.6 (0.16)

0.68 (0.09)

0.55 (0.2)

0.75 (0.05) 0.62 (0.14)

0.59 (0.16)

0.52 (0.23)

0.52 (0.23)

0.58 (0.17)

0.71 (0.07)

0.8 (0.03)*

0.3 (0.51)

0.54 (0.22) –

0.72 (0.07) 0.72 (0.07)



Correlation coefficients and z-test p values (in parentheses) are shown for the ratio of autosomal (upper diagonal) or X-linked (lower diagonal) diversity estimates and genetic distance to the nearest gene for all population pairs (Material and Methods). *p value < 0.05.

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

10 The American Journal of Human Genetics 94, 1–18, June 5, 2014

Table 1.

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

Figure 6. Estimates of Relative Autosomal, X-Linked, and X/A Diversity for Continental Groups Results similar to those in Figure 5 are presented after individuals were organized into continental groups: (A) EUR/AFR ratio, (B) ASN/AFR ratio, (C) AMR/AFR ratio, (D) ASN/EUR ratio, (E) EUR/AMR ratio, and (F) ASN/AMR ratio.

different populations rely on the exact same set of loci, an increase in different population comparisons could arise as a result of consistent differences between these dependent estimates if subtle biases correlating with genetic distance to genes are introduced through the partitioning scheme—e.g., variations in linkage structure between populations of African and non-African ancestry. Across the board, the estimates for other continental populations confirm that the observed reduction in X/A

diversity holds for all comparisons of non-African to African populations. Estimates of relative X/A diversity between East Asian and African population samples were between 0.780 5 0.028 and 0.810 5 0.029 genome-wide and between 0.801 5 0.033 and 0.865 5 0.031 in the loci furthest from genes. These estimates were lowest for JPT (Japanese in Tokyo), followed by CHB (Han Chinese in Bejing, China) and then CHS (Southern Han Chinese), consistently against all African

The American Journal of Human Genetics 94, 1–18, June 5, 2014 11

Correlation Statistics for Relative X/A Diversity with Distance to Genes

LWK

YRI

ASW

PUR

CLM

MXL

IBS

TSI

GBR

CEU

FIN

JPT

CHB

CHS

LWK





























YRI

0.09 (0.84)









































































































ASW

0.2 (0.67)

0.19 (0.68)





PUR

0.44 (0.32)

0.48 (0.27)

0.49 (0.26)



CLM

0.38 (0.4)

0.38 (0.4)

0.39 (0.38)

0.11 (0.82)

MXL

0.25 (0.59)

0.23 (0.62)

0.21 (0.64)

0.14 (0.76)

0.35 (0.44)

IBS

0.75 (0.05)

0.78 (0.04)*

0.75 (0.05)

0.74 (0.05)

0.65 (0.11)

0.63 (0.13)

















TSI

0.68 (0.09)

0.69 (0.08)

0.7 (0.08)

0.79 (0.03)*

0.81 (0.03)*

0.75 (0.05)

0.36 (0.43)















GBR

0.66 (0.11)

0.68 (0.09)

0.69 (0.09)

0.79 (0.04)*

0.84 (0.02)*

0.75 (0.05)

0.23 (0.62)

0.45 (0.31)













CEU

0.58 (0.17)

0.59 (0.16)

0.59 (0.16)

0.6 (0.16)

0.68 (0.09)

0.68 (0.09)

0.49 (0.26)

0.7 (0.08)

0.75 (0.05)



FIN

0.61 (0.15)

0.63 (0.13)

0.65 (0.11)

0.76 (0.05)*

0.78 (0.04)*

0.66 (0.11)

0.32 (0.48)

0.01 (0.99)

0.48 (0.28)

0.34 (0.45)

JPT

0.4 (0.37)

0.43 (0.34)

0.46 (0.3)

0.09 (0.84)

0.04 (0.94)

0.17 (0.71)

0.46 (0.3)

0.36 (0.43)

0.41 (0.36)

CHB

0.64 (0.12)

0.65 (0.11)

0.72 (0.07)

0.54 (0.21)

0.54 (0.21)

0.67 (0.1)

0.23 (0.61)

0.03 (0.94)

CHS

0.46 (0.3)

0.45 (0.31)

0.49 (0.27)

0.18 (0.69)

0.15 (0.74)

0.41 (0.37)

0.42 (0.34)

0.34 (0.45)

















0.24 (0.61)

0.36 (0.43)







0.14 (0.76)

0.11 (0.81)

0.03 (0.95)

0.66 (0.1)





0.4 (0.37)

0.22 (0.64)

0.32 (0.49)

0.13 (0.77)

0.79 (0.04)*



Correlation coefficients and z-test p values (in parenthesis) are shown for estimates of relative X/A diversity and genetic distance to the nearest gene for all population pairs (Material and Methods). *p value < 0.05.

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

12 The American Journal of Human Genetics 94, 1–18, June 5, 2014

Table 2.

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011 Table 3. Absolute X/A Diversity Predicted by Several Models of Human Demography ms Simulations Demographic Model

E(tMRCA) for X/A Diversity

X/A

SEM

Constant population size

0.75

0.751

0.0017

0.784

0.786

0.0015

0.784

0.786

0.0015

Tennessen et al.

0.785

0.780

0.0016

Schaffner et al.60

0.792

0.798

0.0014

0.700

0.707

0.0022

0.696

0.699

0.0022

0.659

0.719

0.0025

0.712

0.724

0.0020

0.711

0.710

0.0020

0.745

0.751

0.0017

0.745

0.751

0.0017

0.690

0.687

0.0022

0.654

0.720

0.0027

0.704

0.720

0.0021

0.736

0.734

0.0017

Africans Gutenkunst et al.57 Gravel et al.

58 61

Europeans Keinan et al.50 56

Gazave et al.

Gutenkunst et al. Gravel et al.

57

58 61

Tennessen et al. Schaffner et al.

60

Nelson et al.59 East Asians Keinan et al.50 Gutenkunst et al. Gravel et al.

58

Schaffner et al.

60

57

Absolute X/A diversity predicted by the best-fitting demographic model as estimated in each of the studies indicated. Models are considered for the demographic history of Africans, Europeans, and East Asians, as available in each study. X/A diversity based on E(tMRCA) is an analytical solution for a simplified model that can consider only piecewise changes in population sizes and no migration (Material and Methods), and X/A diversity based on ms65 coalescent simulations captures the complete model in each study (Material and Methods and Table S2). Considerable differences between the two were observed only for models that include migration, in which case the coalescent simulations provide a much more accurate estimate of model predictions.50,56–61

populations. All together, and also including results at the continental level (ASN/AFR ratio ¼ 0.820 5 0.032), East Asian populations showed even lower estimates of relative X/A diversity than did European populations in comparison to Africans. Considering the effects that changes in population size can have on estimates of X/ A diversity, as discussed in the next section, this last observation agrees with studies reporting more drift in East Asians than in Europeans2,50,57,58,60,75 and points to possible differences in the demographic history of the different East Asian populations. Interestingly, in spite of this stronger reduction in relative X/A diversity between ASN and AFR populations, the evidence of any potential correlation with distance to genes in East Asian relative to African populations is considerably weaker than that observed for Europeans, highlighting the need to consider forces other than the local effects of selection on linked sites to explain the patterns of reduction in estimates of X/A diversity observed out of Africa.

Can Historical Changes in Population Size Explain the Empirical Observations? X/A diversity is sensitive to historical changes in population size, even if these do not introduce a deviation from equal female and male effective population sizes.2,25 One of the most interesting observations arising from previous comparisons of X/A diversity from more limited data sets1,2,8 has been that patterns of both absolute and relative X/A diversity for some human populations are sufficiently striking to merit the consideration of factors beyond known human demographic events such as population bottlenecks, contractions, growth, and migration. Here, we examined the prediction of several of the most recently published models of human demography50,56-61 for absolute and relative X/A diversity (Material and Methods). These provide demographic models for representative African, East Asian, and European populations. All these models were estimated to fit autosomal data. Hence, to test the null hypothesis that these models would explain observed patterns without sex biases, we incorporated the respective model of each for X by assuming an equal proportion of males and females throughout history. Estimates of absolute X/A diversity for the various models of African, European, and East Asian demographic history are presented in Table 3. Simulations produced the expected X/A diversity ¼ 0.75 when no changes in population size were experienced. However, all other models incorporating African demographic events, including population expansion, led to an increment from this expectation (0.780–0.798 across models). Bottlenecks dominated the effect on the two other populations, leading to reductions from 0.75 in the absolute X/A diversity of Europeans (0.699–0.751) and East Asians (0.687– 0.734). Although the direction of these deviations is expected, using a set of the most plausible scenarios of recent human demographic history (Table S2) provides new values and ranges accounting for variations in modeling strategies for contrasting empirical observations. Compared with our empirical results, the results of these simulations have a few salient features. The range of estimates of absolute X/A diversity for relatively neutral regions, furthest from genes, tended to exceed or match the expectation from recent models of human demography: African populations showed a marked increase of at least 7.8% (0.86–0.88) from the expectation (0.7800.798), European populations also had a higher increase, albeit to a smaller degree of ~4% (0.78–0.79), than the expectation (0.699–0.751), and East Asian populations showed estimates overlapping or only modestly exceeding (0.70–0.75) the expectation from demography (0.687–0.734). Although this increase could partially be due to the confounding effect of natural selection on some of the modeling strategies, it points to the possibility that additional factors leading to a relative increase in the female versus male Ne, such as effective polygyny7 over extended historical time periods, have played a past role in human populations.

The American Journal of Human Genetics 94, 1–18, June 5, 2014 13

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011 Table 4. Relative X/A Diversity Predicted by Several Models of Human Demography Demographic Model

ms Simulations

Europeans/Africans Gutenkunst et al.57 Gravel et al.

58

0.9210 61

Tennessen et al. Schaffner et al.

0.9150

60

0.9092 0.9415

East Asians/Africans Gutenkunst et al.57 Gravel et al.

58

Schaffner et al.

0.9147 0.9169

60

0.9193

East Asians/Europeans Keinan et al.50

0.9718

Gutenkunst et al. Gravel et al.

58

Schaffner et al.

57

0.9996 0.9957

60

0.9764

Relative X/A diversity between each pair of continental populations from the same studies as in Table 3. See Table 3 for complete details.

Estimates of relative X/A diversity paint a more refined picture because they capture recent history since the split of the pair of populations considered. Table 4 provides a summary of relative X/A diversity between the three continental populations as predicted by the different demographic models. These generally predicted reductions from the expectation of 1 to between 0.91 and 0.94 for EUR/AFR, 0.91 and 0.92 for ASN/AFR, and 0.97 and 1.01 for ASN/EUR comparisons. The empirical estimates of relative X/A diversity genome-wide were in the range of 0.84– 0.85, 0.78–0.81, and 0.93–0.96 for EUR/AFR, ASN/AFR, and ASN/EUR comparisons, respectively. Although the elevated estimates of relative X/A diversity in the last bin of distance to genes (in comparison to the genome-wide average) are most likely inconsequential, as described above, one could conservatively entertain how this difference would affect results. Tables S5 and S6 show that whereas the relative X/A diversity between EUR and AFR populations for this subset of most neutral loci far from genes fell close to the values predicted by demographic models, the analogous estimates between ASN and AFR populations fell significantly below the lowest estimate of 0.91. The models of East Asian demography evaluated have been based on combined data from CHB and JPT populations, so the greater-than-expected reduction in empirical estimates for JPT populations (which showed the lowest estimates of relative X/A diversity in comparison to Africans) as compared to these models might be a result of models estimated over CHB and JPT individuals as a single population. However, the greater-than-expected reduction for CHB and CHS populations is robust to this aspect of the modeling approaches considered (Tables S5

14 The American Journal of Human Genetics 94, 1–18, June 5, 2014

and S6). Another interesting observation is that the empirical estimates of relative X/A diversity between ASN and EUR populations also generally fell below the ranges predicted by the models, although not significantly. All together, even when we conservatively focus on the higher estimates of relative X/A diversity observed only for the subset of loci in the bin furthest from genes, empirical estimates from the 1000 Genomes populations still point to additional departures from our current understanding of the patterns of expansion, contraction, and growth in human demographic history beyond those previously suggested to be associated with the Out-of-Africa event.2,17,18,31,50,56–61,76 Together with a general lack of clear evidence of a correlation between estimates of relative X/A diversity and distance to genes, the comparison to model predictions for relative X/A diversity suggests a generally stronger reduction of X/A diversity than expected in non-Africans than in Africans. These results highlight the role of demographic events that are not captured by the models and that affect both EUR and ASN populations and are hence most likely associated with the dispersal out of Africa. Such events include, e.g., stronger-than-expected bottlenecks, founder events, or the possibility of waves of male-biased migrations during the dispersal out of Africa.2

Discussion Although several previous studies have looked at patterns of diversity in human X and how they relate to those of autosomes,1–18 they have been based on genotyping data sets, a small set of resequenced loci in six populations, or genomic sequencing data from at most two populations. The present work was made possible by the rich set of complete genome sequences from 14 populations and four continental groups from phase 1 of the 1000 Genomes Project.32 We characterized X/A diversity in greater detail to determine the extent to which genome-wide patterns generalize to additional populations and the degree to which these have been affected by levels of natural selection. We combined these patterns with a richer set of results to tease apart the extent to which different forces have shaped X/A diversity in human populations. Our results for autosomal, X-linked, and absolute X/A diversity confirm and extend previous observations,1–4 showing that differences are evident genome-wide and broadly generalize to additional populations across continental groups. African populations, followed by Europeans and East Asians, show the highest levels of both X and autosomal diversity. Within Africans and Europeans, all populations show similar levels of diversity. Across all results, as well as when one considers results for loci binned by genetic distance to the nearest gene, X-linked loci always show lower diversity than autosomal loci. Absolute X/A diversity increases significantly with distance to genes in all populations, highlighting the more

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

prominent role of diversity-reducing selection on X. This result also highlights the importance of considering the effects of selection; we accomplished this by considering estimates as a function of distance from genes and focusing on loci very far from genes to minimize its impact. However, estimates of absolute X/A diversity in single populations can be sensitive to a wide array of factors that differentially affect X-linked and autosomal estimates and thus lead to biased estimates of X/A diversity. One of these factors is accounting for differences in mutation rates between X and the autosomes in an unbiased manner. Our results show that the influence of ancestral polymorphisms accounts for most of the difference in the estimates of absolute X/A diversity normalized with macaque and orangutan, which can help explain some of the discrepancies between previously published estimates.1,2,6–8 After correction for these effects, the remaining difference of ~2% in estimates between the two outgroups could be explained by variation in the maleto-female mutation rate in primate evolution.15 Although correcting for both of these factors would account for the most significant biases due to normalization by divergence from different outgroups,12 the accuracy of the estimates of absolute X/A diversity for each population would still depend on various assumptions and relatively large levels of uncertainty for all three parameters in the primate tree: split times, ancestral population sizes,46,51,53 and estimates of the variation in male-to-female mutation rates across the different branches.15 An alternative strategy, reducing the number of variables required to account for the variation of mutation rates between both genomic compartments, would be to use direct estimates of the male-to-female mutation rate (a) in humans to account for differences in X-linked and autosomal mutation rates: Nx / Na ¼ (pX / pA) / (mX / uA), where the X/A mutation rate in humans, mX / uA, is (2/3) 3 (2 þ a) / (1 þ a). Although different studies have yielded varying estimates of a in humans, a reasonable estimate of a z 415,77 would imply an X/A mutation rate in humans of 0.8 instead of the 0.89 or 0.91 observed with macaque or orangutan, respectively. This suggests ~12% higher absolute X/A diversities in loci far from genes, even more significantly exceeding the expectation of previous demographic models in all 14 human populations evaluated. Several additional factors can combine with those associated with the normalization of differences in mutation rates, leading to biased estimates of absolute X/A diversity. These include differential ascertainment biases and the assumptions made when variants are called in different parts of the genome. Although next-generation sequencing has alleviated many such issues in comparison to genotyping-based studies of previously discovered SNPs, differences in sample size and systematic variations in readcoverage depth between X-linked and autosomal regions78,79 can lead to lower accuracy and power in calling X-linked SNPs than in calling autosomal variants. We minimized the effect of these biases by considering only

females for analysis and the 1000 Genomes phase 1 pipeline, which involves SNP-calling algorithms that explicitly account for the hemizygosity of X in males.32 However, we do not claim to have fully accounted for all factors differentially affecting diversity estimates on X and the autosomes. To eliminate any remaining biases differentially affecting X-linked and autosomal diversity, we instead considered estimates of relative X/A diversity; because this is a ratio of absolute X/A diversity between a pair of populations, biases that have the same effects on the two compared populations cancel out. This statistic also facilitates inferences about more recent events in human history given that it is affected by the epoch since the split of the two populations rather than the entire period from the most recent common ancestor of a sample, which is about an order of magnitude longer. Teasing apart factors shaping differences between populations from those in their shared genealogical history provided us with better interpretation of possible sex-biased processes beyond those shared by all human populations. Results based on relative autosomal, X-linked, and X/A diversity confirm previously observed patterns from a single African and a single non-African population1,2 while showing how these generalize across continental groups. We observed a general trend of lower X/A diversity in non-Africans than in Africans at both the population and the continental levels, whereas relative to Europeans, East Asian populations showed some evidence of an even larger reduction. Importantly, estimates of relative X/A diversity (unlike those of absolute X/A diversity) were invariant across bins of genetic distance to the nearest gene, increasing the power of the approach by allowing inferences from a larger set of genomic loci and suggesting that the relative differences in the impact of diversityreducing selection between X and the autosomes have in general been similar between different populations. An open question in the literature is the degree to which patterns of relative X/A diversity can be explained by sexunbiased differences in population-size changes in human history and whether additional evolutionary forces that differentially affect X-linked and autosomal diversity between populations need to be invoked. Results for many of the most recently published models of human demographic history suggest that the expected effect of simple population-size changes alone can explain part, but not all, of the observed reduction in relative X/A diversity between each non-African population and each African population, consistent with previous observations.1,2 These results provide evidence that additional demographic factors need to be invoked for explaining these differences. Finally, we note two additional factors that could potentially affect inferences based on estimates of relative X/A diversity: (1) particular selection scenarios, which, although unlikely, could still affect X and the autosomes differently, such as strong population-specific selection on a few loci or widespread selection across a

The American Journal of Human Genetics 94, 1–18, June 5, 2014 15

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

large portion of X; and (2) Neandertal introgression, which will increase nucleotide diversity and which not only has been shown to have been higher on the autosomes than on X-linked loci but has also occurred almost exclusively in non-African populations.80 To account for this, we further removed human haplotypes confidently inferred to be of Neandertal origin80 and re-estimated relative X/A diversity. Between EUR and AFR populations and between ASN and AFR populations, the new estimates were ~1% higher across all comparisons, showing that Neandertal introgression can lead to a modest underestimation of relative X/A diversity. An improved understanding of the magnitude, types, and geographical ranges of archaic introgression will allow for a better understanding of the full impact of this phenomenon in future studies. In conclusion, our results provide strong evidence that additional factors beyond the simple population-size changes and migration accounted for by many of the recent models of human demography are likely to have played a role in shaping relative patterns of X/A diversity genome-wide between African and non-African populations in general. These results, based on the consideration of a wide range of human demographic models, extend previous results based also on comparisons of the site frequency spectrum and population-differentiation estimates,2,4,18 which have suggested the types of demographic factors that need to be invoked for explaining them. These include a variety of forces that can reduce relative X/A diversity during the Out-of-Africa event, such as not only more extreme founder events than estimated in the above models but also importantly male-biased migrations,2 reductions in effective polygyny, or other sexbiased processes. Supplemental Data Supplemental Data include ten figures and six tables and can be found with this article online at http://dx.doi.org/10.1016/j. ajhg.2014.04.011.

Acknowledgments We thank Diana Chang and Aaron Sams for comments and discussions on earlier versions of the manuscript. This work was supported in part by NIH grants R01-HG006849 (A.K.), U01HG005715 (A.K.), and R01-GM102192 (A.S.) and by an award from the Ellison Medical Foundation (A.K.). Received: March 5, 2014 Accepted: April 15, 2014 Published: May 15, 2014

Web Resources The URLs for data presented herein are as follows: 1000 Genomes Phase 1 Consensus Set, ftp://ftp-trace.ncbi.nih.gov/ 1000genomes/ftp/phase1/analysis_results/integrated_call_sets

16 The American Journal of Human Genetics 94, 1–18, June 5, 2014

HapMap recombination rates lifted over to GRCh37 coordinates, http://hapmap.ncbi.nlm.nih.gov/downloads/recombination/ 2011-01_phaseII_B37

References 1. Gottipati, S., Arbiza, L., Siepel, A., Clark, A.G., and Keinan, A. (2011). Analyses of X-linked and autosomal genetic variation in population-scale whole genome sequencing. Nat. Genet. 43, 741–743. 2. Keinan, A., Mullikin, J.C., Patterson, N., and Reich, D. (2009). Accelerated genetic drift on chromosome X during the human dispersal out of Africa. Nat. Genet. 41, 66–70. 3. Keinan, A., and Reich, D. (2010). Human population differentiation is strongly correlated with local recombination rate. PLoS Genet. 6, e1000886. 4. Keinan, A., and Reich, D. (2010). Can a sex-biased human demography account for the reduced effective population size of chromosome X in non-Africans? Mol. Biol. Evol. 27, 2312–2321. 5. Lohmueller, K.E., Degenhardt, J.D., and Keinan, A. (2010). Sex-averaged recombination and mutation rates on the X chromosome: a comment on Labuda et al. Am. J. Hum. Genet. 86, 978–980, author reply 980–981. 6. Hammer, M.F., Garrigan, D., Wood, E., Wilder, J.A., Mobasher, Z., Bigham, A., Krenz, J.G., and Nachman, M.W. (2004). Heterogeneous patterns of variation among multiple human x-linked Loci: the possible role of diversity-reducing selection in non-africans. Genetics 167, 1841–1853. 7. Hammer, M.F., Mendez, F.L., Cox, M.P., Woerner, A.E., and Wall, J.D. (2008). Sex-biased evolutionary forces shape genomic patterns of human diversity. PLoS Genet. 4, e1000202. 8. Hammer, M.F., Woerner, A.E., Mendez, F.L., Watkins, J.C., Cox, M.P., and Wall, J.D. (2010). The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes. Nat. Genet. 42, 830–831. 9. Wilder, J.A., Mobasher, Z., and Hammer, M.F. (2004). Genetic evidence for unequal effective population sizes of human females and males. Mol. Biol. Evol. 21, 2047–2057. 10. Mank, J.E., Vicoso, B., Berlin, S., and Charlesworth, B. (2010). Effective population size and the Faster-X effect: empirical results and their interpretation. Evolution 64, 663–674. 11. Vicoso, B., and Charlesworth, B. (2009). Effective population size and the faster-X effect: an extended model. Evolution 63, 2413–2426. 12. Bustamante, C.D., and Ramachandran, S. (2009). Evaluating signatures of sex-specific processes in the human genome. Nat. Genet. 41, 8–10. 13. Labuda, D., Lefebvre, J.F., Nadeau, P., and Roy-Gagnon, M.H. (2010). Female-to-male breeding ratio in modern humans-an analysis based on historical recombinations. Am. J. Hum. Genet. 86, 353–363. 14. Ellegren, H. (2009). The different levels of genetic diversity in sex chromosomes and autosomes. Trends Genet. 25, 278–284. 15. Xu, K., Oh, S., Park, T., Presgraves, D.C., and Yi, S.V. (2012). Lineage-specific variation in slow- and fast-X evolution in primates. Evolution 66, 1751–1761. 16. Lambert, C.A., Connelly, C.F., Madeoy, J., Qiu, R., Olson, M.V., and Akey, J.M. (2010). Highly punctuated patterns of population structure on the X chromosome and

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

17.

18.

19.

20.

21.

22.

23. 24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

implications for African evolutionary history. Am. J. Hum. Genet. 86, 34–44. Casto, A.M., Li, J.Z., Absher, D., Myers, R., Ramachandran, S., and Feldman, M.W. (2010). Characterization of X-linked SNP genotypic variation in globally distributed human populations. Genome Biol. 11, R10. Emery, L.S., Felsenstein, J., and Akey, J.M. (2010). Estimators of the human effective sex ratio detect sex biases on different timescales. Am. J. Hum. Genet. 87, 848–856. Kong, A., Frigge, M.L., Masson, G., Besenbacher, S., Sulem, P., Magnusson, G., Gudjonsson, S.A., Sigurdsson, A., Jonasdottir, A., Jonasdottir, A., et al. (2012). Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475. Sun, J.X., Helgason, A., Masson, G., Ebenesersdo´ttir, S.S., Li, H., Mallick, S., Gnerre, S., Patterson, N., Kong, A., Reich, D., and Stefansson, K. (2012). A direct characterization of human mutation based on microsatellites. Nat. Genet. 44, 1161– 1165. Heyer, E., Chaix, R., Pavard, S., and Austerlitz, F. (2012). Sex-specific demographic behaviours that shape human genomic variation. Mol. Ecol. 21, 597–612. Corl, A., and Ellegren, H. (2012). The genomic signature of sexual selection in the genetic diversity of the sex chromosomes and autosomes. Evolution 66, 2138–2149. Bachtrog, D. (2011). Plant sex chromosomes: a non-degenerated Y? Curr. Biol. 21, R685–R688. Evans, B.J., and Charlesworth, B. (2013). The effect of nonindependent mate pairing on the effective population size. Genetics 193, 545–556. Pool, J.E., and Nielsen, R. (2007). Population size changes reshape genomic patterns of diversity. Evolution 61, 3001– 3006. Charlesworth, B., Coyne, J.A., and Barton, N.H. (1987). The relative rates of evolution of sex chromosomes and autosomes. Am. Nat. 130, 113–146. Charlesworth, B., Morgan, M.T., and Charlesworth, D. (1993). The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303. Andolfatto, P. (2001). Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans. Mol. Biol. Evol. 18, 279–290. Wall, J.D., Andolfatto, P., and Przeworski, M. (2002). Testing models of selection and demography in Drosophila simulans. Genetics 162, 203–216. Betancourt, A.J., Kim, Y., and Orr, H.A. (2004). A pseudohitchhiking model of X vs. autosomal diversity. Genetics 168, 2261–2269. Amato, R., Pinelli, M., Monticelli, A., Marino, D., Miele, G., and Cocozza, S. (2009). Genome-wide scan for signatures of human population differentiation and their relationship with natural selection, functional pathways and diseases. PLoS ONE 4, e7927. Abecasis, G.R., Auton, A., Brooks, L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., and McVean, G.A.; 1000 Genomes Project Consortium (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65. Betzig, L. (2012). Means, variances, and ranges in reproductive success: Comparative evidence. Evol. Hum. Behav. 33, 309–317.

34. Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. (2002). The human genome browser at UCSC. Genome Res. 12, 996–1006. 35. Benson, G. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580. 36. Gardiner-Garden, M., and Frommer, M. (1987). CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261–282. 37. Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R., and Siepel, A. (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121. 38. Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers, E.W., Li, P.W., and Eichler, E.E. (2002). Recent segmental duplications in the human genome. Science 297, 1003–1007. 39. Hsu, F., Kent, W.J., Clawson, H., Kuhn, R.M., Diekhans, M., and Haussler, D. (2006). The UCSC Known Genes. Bioinformatics 22, 1036–1046. 40. Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2007). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35 (Database issue), D61–D65. 41. Harrow, J., Denoeud, F., Frankish, A., Reymond, A., Chen, C.K., Chrast, J., Lagarde, J., Gilbert, J.G., Storey, R., Swarbreck, D., et al. (2006). GENCODE: producing a reference annotation for ENCODE. Genome Biol. 7 (Suppl 1 ), 1–9. 42. Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. (2003). Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. USA 100, 11484–11489. 43. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et al. (2005). Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050. 44. Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. (2004). Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715. 45. Gibbs, R.A., Rogers, J., Katze, M.G., Bumgarner, R., Weinstock, G.M., Mardis, E.R., Remington, K.A., Strausberg, R.L., Venter, J.C., Wilson, R.K., et al.; Rhesus Macaque Genome Sequencing and Analysis Consortium (2007). Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222–234. 46. Locke, D.P., Hillier, L.W., Warren, W.C., Worley, K.C., Nazareth, L.V., Muzny, D.M., Yang, S.P., Wang, Z., Chinwalla, A.T., Minx, P., et al. (2011). Comparative and demographic analysis of orang-utan genomes. Nature 469, 529–533. 47. Jukes, T.H., and Cantor, C.R. (1969). Evolution of protein molecules. In Mammalian protein metabolism, H.N. Munro, ed. (New York: Academic Press), pp. 21–123. 48. Lahiri, S.N. (2003). Resampling methods for dependent data (New York: Springer). 49. Liu, R.Y., and Singh, K. (1992). Moving blocks jackknife and bootstrap capture weak dependence. In Exploring the limits of bootstrap, R. LePage and L. Billard, eds. (New York: John Wiley), pp. 225–248. 50. Keinan, A., Mullikin, J.C., Patterson, N., and Reich, D. (2007). Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nat. Genet. 39, 1251–1255.

The American Journal of Human Genetics 94, 1–18, June 5, 2014 17

Please cite this article in press as: Arbiza et al., Contrasting X-Linked and Autosomal Diversity across 14 Human Populations, The American Journal of Human Genetics (2014), http://dx.doi.org/10.1016/j.ajhg.2014.04.011

51. Burgess, R., and Yang, Z. (2008). Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol. Biol. Evol. 25, 1979–1994. 52. Siepel, A. (2009). Phylogenomics of primates and their ancestral populations. Genome Res. 19, 1929–1941. 53. Scally, A., Dutheil, J.Y., Hillier, L.W., Jordan, G.E., Goodhead, I., Herrero, J., Hobolth, A., Lappalainen, T., Mailund, T., Marques-Bonet, T., et al. (2012). Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175. 54. International HapMap Consortium (2005). A haplotype map of the human genome. Nature 437, 1299–1320. 55. Frazer, K.A., Ballinger, D.G., Cox, D.R., Hinds, D.A., Stuve, L.L., Gibbs, R.A., Belmont, J.W., Boudreau, A., Hardenbol, P., Leal, S.M., et al.; International HapMap Consortium (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861. 56. Gazave, E., Ma, L., Chang, D., Coventry, A., Gao, F., Muzny, D., Boerwinkle, E., Gibbs, R.A., Sing, C.F., Clark, A.G., and Keinan, A. (2014). Neutral genomic regions refine models of recent rapid human population growth. Proc. Natl. Acad. Sci. USA 111, 757–762. 57. Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H., and Bustamante, C.D. (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695. 58. Gravel, S., Henn, B.M., Gutenkunst, R.N., Indap, A.R., Marth, G.T., Clark, A.G., Yu, F., Gibbs, R.A., and Bustamante, C.D.; 1000 Genomes Project (2011). Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA 108, 11983–11988. 59. Nelson, M.R., Wegmann, D., Ehm, M.G., Kessner, D., St Jean, P., Verzilli, C., Shen, J., Tang, Z., Bacanu, S.A., Fraser, D., et al. (2012). An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104. 60. Schaffner, S.F., Foo, C., Gabriel, S., Reich, D., Daly, M.J., and Altshuler, D. (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576– 1583. 61. Tennessen, J.A., Bigham, A.W., O’Connor, T.D., Fu, W., Kenny, E.E., Gravel, S., McGee, S., Do, R., Liu, X., Jun, G., et al.; Broad GO; Seattle GO; NHLBI Exome Sequencing Project (2012). Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69. 62. Marth, G.T., Czabarka, E., Murvai, J., and Sherry, S.T. (2004). The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166, 351–372. 63. Hudson, R.R. (2002). Generating samples under a WrightFisher neutral model of genetic variation. Bioinformatics 18, 337–338. 64. Abecasis, G.R., Altshuler, D., Auton, A., Brooks, L.D., Durbin, R.M., Gibbs, R.A., Hurles, M.E., and McVean, G.A.; 1000 Genomes Project Consortium (2010). A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073. 65. Kidd, J.M., Gravel, S., Byrnes, J., Moreno-Estrada, A., Musharoff, S., Bryc, K., Degenhardt, J.D., Brisbin, A., Sheth, V., Chen,

18 The American Journal of Human Genetics 94, 1–18, June 5, 2014

66.

67.

68.

69.

70.

71.

72.

73.

74.

75.

76.

77.

78.

79.

80.

R., et al. (2012). Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation. Am. J. Hum. Genet. 91, 660–671. Voight, B.F., Adams, A.M., Frisse, L.A., Qian, Y., Hudson, R.R., and Di Rienzo, A. (2005). Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proc. Natl. Acad. Sci. USA 102, 18508– 18513. Bryc, K., Velez, C., Karafet, T., Moreno-Estrada, A., Reynolds, A., Auton, A., Hammer, M., Bustamante, C.D., and Ostrer, H. (2010). Colloquium paper: genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proc. Natl. Acad. Sci. USA 107 (Suppl 2 ), 8954–8961. Bedoya, G., Montoya, P., Garcı´a, J., Soto, I., Bourgeois, S., Carvajal, L., Labuda, D., Alvarez, V., Ospina, J., Hedrick, P.W., and Ruiz-Linares, A. (2006). Admixture dynamics in Hispanics: a shift in the nuclear genetic ancestry of a South American population isolate. Proc. Natl. Acad. Sci. USA 103, 7234–7239. McVicker, G., Gordon, D., Davis, C., and Green, P. (2009). Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471. Haddrill, P.R., Thornton, K.R., Charlesworth, B., and Andolfatto, P. (2005). Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Genome Res. 15, 790–799. Begun, D.J., and Aquadro, C.F. (1993). African and North American populations of Drosophila melanogaster are very different at the DNA level. Nature 365, 548–550. Begun, D.J., and Aquadro, C.F. (1995). Evolution at the tip and base of the X chromosome in an African population of Drosophila melanogaster. Mol. Biol. Evol. 12, 382–390. Begun, D.J., Holloway, A.K., Stevens, K., Hillier, L.W., Poh, Y.P., Hahn, M.W., Nista, P.M., Jones, C.D., Kern, A.D., Dewey, C.N., et al. (2007). Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 5, e310. Begun, D.J., and Whitley, P. (2000). Reduced X-linked nucleotide polymorphism in Drosophila simulans. Proc. Natl. Acad. Sci. USA 97, 5960–5965. Li, J.Z., Absher, D.M., Tang, H., Southwick, A.M., Casto, A.M., Ramachandran, S., Cann, H.M., Barsh, G.S., Feldman, M., Cavalli-Sforza, L.L., and Myers, R.M. (2008). Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104. Keinan, A., and Clark, A.G. (2012). Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743. Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., et al. (2002). A high-resolution recombination map of the human genome. Nat. Genet. 31, 241–247. Eberle, M.A., and Kruglyak, L. (2000). An analysis of strategies for discovery of single-nucleotide polymorphisms. Genet. Epidemiol. 19 (Suppl 1 ), S29–S35. Crawford, J.E., and Lazzaro, B.P. (2012). Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data. Front. Genet 3, 66. ¨ fer, K., Sankararaman, S., Mallick, S., Dannemann, M., Pru Kelso, J., Pa¨a¨bo, S., Patterson, N., and Reich, D. (2014). The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357.

Contrasting X-linked and autosomal diversity across 14 human populations.

Contrasting the genetic diversity of the human X chromosome (X) and autosomes has facilitated understanding historical differences between males and f...
3MB Sizes 2 Downloads 3 Views