The annual meeting of the American Society of Human Genetics will take place from 6 - 10 November in San Francisco. The posters can be searched online from the ASHG meeting website. The following three abstracts will be of particular interest to the genetic genealogy community.
The GenoChip: a new tool for genetic anthropology
S. Wells, E. Greenspan, S. Staats, T. Krahn, C. Tyler-Smith, Y. Xue, S. Tofanelli, P. Francalacci, F. Cucca, L. Pagani, L. Jin, H. Li, T. G. Schurr, J. B. Gaieski, C. Melendez, M. G. Vilar, A. C. Owings, R. Gomez, R. Fujita, F. Santos, D. Comas, O. Balanovsky, E. Balanovska, P. Zalloua, H. Soodyall, R. Pitchappan, G. Arun Kumar, M. F. Hammer, B. Greenspan, E. Elhaik
Background: The Genographic Project is an international effort aimed at charting human history using genetic data. The project is non-profit and non-medical, and through the sale of its public participation kits it supports cultural preservation efforts in indigenous and traditional communities. To extend our knowledge of the human journey, interbreeding with ancient hominins, and modern human demographic history, we designed a genotyping chip optimized for genetic anthropology research. Methods: Our goal was to design, produce, and validate a SNP array dedicated to genetic anthropology. The GenoChip is an Illumina HD iSelect genotyping bead array with over 130,000 highly informative autosomal and X-chromosomal SNPs ascertained from over 450 worldwide populations, ~13,000 Y-chromosomal SNPs, and ~3,000 mtDNA SNPs. To determine the extent of gene flow from archaic hominins to modern humans, we included over 25,000 SNPs from candidate regions of interbreeding between extinct hominins (Neanderthal and Denisovan) and modern humans. To avoid any inadvertent medical testing we filtered out all SNPs that have known or suspected health or functional associations. We validated the chip by genotyping over 1,000 samples from 1000 Genomes, Family Tree DNA, and Genographic Project populations. Results: The concordance between the GenoChip and the 1000 Genomes data was over 99.5%. The GenoChip has a SNP density of approximately (1/100,000) bases over 92% of the human genome and is highly compatible with Illumina and Affymetrix commercial platforms. The ~10,000 novel Y SNPs included on the chip have greatly refined our understanding of the Y-chromosome phylogenetic tree. By including Y and mtDNA SNPs on an unprecedented scale, the GenoChip is able to delineate extremely detailed human migratory paths. The autosomal and X-chromosomal markers included on the GenoChip have revealed novel patterns of ancestry that shed a detailed new light on human history. Interbreeding analysis with extinct hominids confirmed some previous reports and allowed us to describe the modern geographical distribution of these markers in detail. Conclusions: The GenoChip is the first genotyping chip completely dedicated to genetic anthropology with no known medically relevant markers. We anticipate that the large-scale application of the GenoChip using the Genographic Project’s diverse sample collection will provide new insights into genetic anthropology and human history.
People of the British Isles: An analysis of the genetic contributions of European populations to a UK control population
S. Leslie, B. Winney, G. Hellenthal, S. Myers, P. Donnelly, W. Bodmer
There is much interest in fine scale population structure in the UK, as a signature of historical migration events and because of the effect population structure may have on disease association studies. Population structure appears to have a minor impact on the current generation of genome-wide association studies, but will probably be important for the next generation of studies seeking associations to rare variants. Furthermore there is great interest in understanding where the British people came from. Thus far genetic studies have been limited to a small number of markers or to samples not collected to specifically address these questions. A natural method for understanding population structure is to control and document carefully the provenance of samples. We describe the collection of a cohort of rural UK samples (The People of the British Isles), aimed at providing a well-characterised UK control population. This will be a resource for research community as well as providing fine-scale genetic information on the history of the British. Using a novel clustering algorithm, approximately 2000 samples were clustered purely as a function of genetic similarity, without reference to their known sampling locations. When each individual is plotted on a UK map, there is a striking association between inferred clusters and geography, reflecting to a major extent the known history of the British peoples. A similar analysis is performed on samples from different parts of Europe. Using the European samples as ‘source populations’ we apply a novel algorithm to determine the proportion of the genomes within each of the derived British clusters that are most closely related to each of the source populations. Thus we can observe the relative contribution (under our model) of each of these European populations to the genomes of samples in different regions of Britain. Our results strikingly reflect much of the known historical and archaeological record while raising some important questions and perhaps answering others. We believe this is the first detailed analysis of very fine-scale genetic structure and its origin in a population of very similar humans. This has been achieved through both a careful sampling strategy and an approach to analysis that accounts for linkage disequilibrium.
Inferring Y Chromosome Phylogeny by Sequencing Diverse Populations
G. D. Poznik, P. A. Underhill, B. M. Henn, M. C. Yee, E. Sliwerska, G. M. Euskirchen, L. Quintana-Murci, E. Patin, M. Snyder, J. M. Kidd, C. D. Bustamante
The male-specific region of the Y chromosome (MSY) harbors the longest stretch of non-recombining DNA in the human genome and is therefore a unique tool that enables the tracking of migrations and inference of demographic history. We have sequenced 69 male samples from nine globally diverse populations, including three African hunter-gatherer groups. Due to inefficient selection, a relatively high mutation rate, and a small effective population size, the Y chromosome is particularly subject to drift. It has accumulated large expanses of highly repetitive sequence, which pose considerable challenge within a short read sequencing paradigm. To overcome this hurdle, we have built an informatics pipeline to reliably call Y chromosome alleles from moderate coverage short read shotgun sequence data. First, we defined a callability mask, learned from the mapping quality and depth of coverage patterns in the data, and then we tuned base-pair level quality control thresholds. Based on 13,000 provisional SNP calls, we inferred a tree of the 69 sequenced Y chromosomes. Using this tree, we then called individual genotypes for each SNP with a custom-built, phylogeny-aware, EM algorithm. With these high quality calls in hand, samples were assigned haplogroup labels using standard YCC nomenclature; 29 distinct named haplogroups were represented. We find that the maximum likelihood tree we construct recapitulates the extant Y chromosome phylogeny, thus confirming the fruits of decades of work based on ascertained SNPs. Further, we resolve a major long-standing polytomy by identifying a variant for which one haplogroup retains the ancestral allele, whereas its brother clades share the derived allele, thus indicating common ancestry and uniting the latter two branches. This finding has been confirmed by genotyping a larger panel. Finally, we estimate the MSY rate of mutation recurrence and the time to the most recent common ancestor of the sampled chromosomes.