Friday 8 November 2013

A simplified Y-tree and a common standard for Y-DNA haplogroup and SNP nomenclature

This article is for experienced genetic genealogists and requires an understanding of SNPs and haplogroups.

A very useful online resource for Y-chromosome researchers in the form of a simplified version of the Y-chromosome SNP tree has come online this week. The new pared-down version of the Y-tree is introduced in a paper by Mannis Van Oven, Anneleen Van Geystelen, Manfred Kayser, Ronny Decorte and Maarten H D Larmuseau entitled Seeing the Wood for the Trees: A Minimal Reference Phylogeny for the Human Y Chromosome. The paper has been accepted for publication in the scientific journal Human Mutation but has yet to go through the full editorial process. Mannis Van Oven's name is already well known to mitochondrial DNA researchers because he maintains the Phylotree website which hosts the definitive mtDNA tree. The simplified Y-tree is conveniently being maintained on the same website and can be found at www.phylotree.org/Y

The new Phylotree version of the Y-tree will serve as a complement to the full Y-SNP tree which is maintained by ISOGG (the International Society of Genetic Genealogy). The Y-tree is now a very complicated structure and is set to become even more detailed in the coming months with the flood of new Y-SNPs that are being discovered from academic projects and through commercial testing with Full Genomes Corp, the Genographic Project (Geno 2.0) and BritainsDNA/ScotlandsDNA (Chromo 2). There will always be a need to have the fine detail of the full high-resolution tree, especially when one is trying to drill right down to the low-hanging branches. However, sometimes it's useful to get an overview of the structure of the tree as a whole without the complication of all the addition sub-branches, twigs and twiglets, and this is something that the new Phylotree Y-tree does very nicely.

I'm very pleased to see that the paper acknowledges the contributions made by the many "independent researchers" within the genetic genealogy community. The resources that the authors used to compile their reference phylogeny included "a large number of websites maintained by independent researchers", all of whom are named in the acknowledgements.

An important innovation in this paper is a very welcome attempt to introduce a much-needed common standard for Y-SNP and Y-haplogroup nomenclature. As the authors explain "Due to multiple independent discovery events, a considerable number of Y-SNPs are known by multiple names". This diversity of names is a source of considerable confusion for both academic researchers and genetic genealogists. For example, haplogroup R1b1a2, the predominant European haplogroup, has two major branches. The markers that define these branches are known as P312 and U106 at the Genographic Project and Family Tree DNA but have the alternative names S116 and S21 at BritainsDNA/ScotlandsDNA. All four of these marker names appear in the scientific literature but the scientists often don't provide the alternative names. ISOGG provides a Y-SNP index which allows the researcher to check for other SNP names but not every researcher will know of this resource. The solution proposed by Van Oven et al is to decide on "one default name depending on which of the aliases is most frequently used in the literature", and these are the names which appear in the Phylotree Y-tree, though the alternative names are given in the accompanying spreadsheet.

It does of course remain to be seen if the scientists and testing companies will adopt the recommended nomenclature for the 417 SNPs included on the simplified Y-tree, but we can certainly hope that they will do so. Most of the names are already in use at Family Tree DNA and within the various FTDNA haplogroup projects. The one SNP name on the tree which will probably cause the most difficulties is R-M529, which is currently better known as L21 and sometimes S145. The name M529 seems to have been chosen because it was cited in an academic paper published in 2011 by Myres et al.1 However, the name L21 is now so ingrained in the collective genetic genealogy consciousness that I suspect that the proposed new name will probably not catch on. BritainsDNA have always used their own proprietary S series naming system but I hope that they will at least consider adopting the new nomenclature for the core SNPs included on the Phylotree Y-tree so that we can all speak a common language.

In the coming months we can expect an explosion of new Y-SNPs now that the first results have started to come in for the Chromo 2 test from BritainsDNA/Scotlands DNA and from the full Y-chromosome sequencing tests at Full Genomes. However, the nomenclature will continue to be a big problem as each company tries to maintain a competitive advantage. Full Genomes have already indicated that they will be offering custom single SNPs for sale to compete with FTDNA. We can probably expect to see a flood of FG SNPs being made available in the next few months. The positions of the new FG SNPs on the tree are not yet known so no other companies will be able to offer these new SNPs. So far I've only seen one data file from the BritainsDNA Chromo 2 test. This file contains over 14,000 Y-SNPs, of which around 8000 or more are proprietary S series SNPs, only a tiny percentage of which are listed in the ISOGG Y-SNP index. It may be that many of the BritainsDNA SNPs will turn out to be equivalent to the SNPs that are already on the ISOGG tree or included on the Geno 2.0 chip, and these SNPs will almost certainly be included in the Full Genomes test. However, neither BritainsDNA nor the Genographic Project provide the genome reference positions for the SNPs on their chips so there is currently no way of knowing which S series SNPs are already known about and which ones are new.  Fortunately there are many pioneers with large pockets in the genetic genealogy community who can afford to have their DNA tested at Full Genomes, BritainsDNA and the Genographic Project. With data available for comparison from two or more companies it should then be possible for the volunteer haplogroup project administrators to compare the results and establish the positions of any newly discovered SNPs on the Y-tree.

The other unknown is whether or not Family Tree DNA will be responding to the competition from Full Genomes and BritainsDNA. Their group administrators' conference is taking place this weekend in Houston, Texas, and the conference schedule has now been made available online. Miguel Vilar from the Genographic Project will be providing a Geno 2.0 update and talking about the Y-2014 tree, and Michael Hammer will be talking about the "implications of the 2014 Y-tree". FTDNA usually make a big announcement at the conference and the speculation is that they will perhaps be announcing the launch of a new Geno chip and/or the introduction of a full Y-chromosome test. Spencer Wells has already indicated that a new Geno chip might be on the way as early as 2014.2

Unfortunately, all three currently available Y-SNP tests are very expensive and well beyond the means of the average genetic genealogist. I'm rather hoping that at some point one of the companies will introduce a cheaper Y-SNP test that will allow a customer to have a refined haplogroup designation sufficient to rule out false positive matches but without breaking the bank.

For the moment I would advise anyone considering ordering a Y-SNP test to wait and see what the results are from the tests taken by the early adopters. If you want to join the pioneers and experiment with one of the new SNP tests then you can see a chart comparing the services offered by the main testing companies in the ISOGG Wiki.

With so many exciting new developments I wonder what the Y-chromosome tree will look like in 2014. The ISOGG SNP Index lists all the SNPs that are either on the Y-tree or which are under investigation, but these SNPs represent less than 10% of the known Y-SNPs. David Reynolds maintains the ISOGG Y-SNP Compendium Spreadsheet which currently contains almost 40,000 additional Y-SNPs, and has indicated that he still has over 12,000 SNPs to add, time permitting. The SNPs in this spreadsheet have not all been validated and many are not available for testing at any commercial company. It may well be that the tree will increase in size ten-fold or more in the next twelve months which will represent a significant challenge for the volunteer ISOGG Y-SNP team who maintain the tree in their own free time.

Chris Tyler-Smith cautioned us in February at a special ISOGG presentation at the Sanger Institute in Cambridge that the Y-tree nomenclature system was set to break down in 2013, and indeed that already seems to be the case. He raised the possibility of using an ancestral reference sequence for the Y-chromosome along the lines of the RSRS (Reconstructed Sapiens Reference Sequence) introduced for mitochondrial DNA in 2012.3 I wonder if that is something that we will see implemented in 2014.

Whatever the future has in store it is certainly a very exciting time for Y-chromosome researchers and, as Chris Tyler-Smith commented in February, there will be "more opportunities than ever for computer-literate citizen scientists".

References
1. Myres NM, Rootsi S, Lin AA et al. A major Y-chromosome haplogroup R1b Holocene era founder effect in Central and Western EuropeEuropean Journal of Human Genetics 2011; 19 (1); 95-101.
2. Petrone J. National Geographic considering move to new SNP chip for Genographic Project. GenomeWeb, 13 August 2013.
3. Behar DM, Van Oven M, Rosset S et al. A "Copernican" reassessment of the human mitochondrial DNA tree from its root. American Journal of Human Genetics 2012; 90 (5): 936. 

Resources
The ISOGG Y-DNA SNP testing comparison chart
A list of Y-DNA haplogroup projects
BritainsDNA haplogroup nicknames

See also
- A confusion of SNPs

© 2013 Debbie Kennett

2 comments:

Kelly said...

Debbie,
As usual thank you for your excellent synopsis of the current state of Y affairs! I do hope that the nomenclature is standardized as it is hard enough to remember one SNP name let alone 3. Fingers crossed that the FTDNA conference brings welcome news.
Kelly Wheaton

Debbie Kennett said...

Thanks Kelly. I also struggle to remember most of the SNP names apart from those of a few core SNPs that are integral to my Cruwys project. I shall be intrigued to learn what, if anything, FTDNA have in store this weekend. My best guess is a new Geno chip using next generation sequencing rather than the much less accurate chip-based technology used for Geno 2.0 and Chromo 2. It might or might not be significant that David Mittelman, the new FTDNA chief scientist, is doing a presentation on next generation sequencing at the conference.

If the Genographic do come up with a new test I hope they can do something at a reasonable price around the $99 mark rather than $199, which is still way too expensive for most of my project members.

Of course I might be on the wrong track altogether!