The tree is too large to fit into a single screenshot. Here is the relevant portion of the tree for my dad who is R-Z12, a sub-branch of U106.
Note that this is very much an interim tree. It is based on SNPs tested with the Genographic Project's Geno 2.0 chip, and the cut off date for inclusion of SNPs is November 2013. The new tree does not include all the thousands of new SNPs identified from testing with Big Y, Full Genomes and Chromo 2. The tree will eventually be much more comprehensive but FTDNA are being careful about the data they use from other sources and are insisting that SNPs are only added from published data and raw data that they have personally verified rather than from interpreted data. They have promised that at least one update will be released this year. The FTDNA Learning Center will eventually be updated with information about the new haplotree. If you have questions about a particular SNP that is in the wrong place on the tree or if you spot any other errors you should send an e-mail to the FTDNA help desk with Y-Tree in the subject line.
FTDNA are now recommending SNPs for people to test. I've only had a chance to look briefly at the SNP recommendations for a few project members. It is apparent that in some cases the SNPs that are recommended for testing are not appropriate. SNPs are only recommended if they pass certain percentage thresholds and there might well be a more appropriate downstream SNP that would be more suitable. If you are interested in ordering single SNP testing, make sure you join the appropriate Y-DNA haplogroup project and seek advice from the project administrators. If not, you could end up wasting money ordering unnecessary SNPs.
The following information has been provided by Family Tree DNA.
• Created in partnership with National Geographic’s Genographic Project
• Used GenoChip containing ~10,000 previously unclassified Y-SNPs
• Some of those SNPs came from Walk Through the Y and the 1000 Genome Project
• Used first 50,000 high-quality male Geno 2.0 samples
• Verified positions from 2010 YCC by Sanger sequencing additional anonymous samples
• Filled in data on rare haplogroups using later Geno 2.0 samples
• Expanded from approximately 400 to over 1200 terminal branches
• Increased from around 850 SNPs to over 6200 SNPs
• Cut-off date for inclusion for most haplogroups was November 2013
Total number of SNPs broken down by haplogroup:
• Existing customers receive free update to predictions and confirmed branches based on existing SNP test results.
• Haplogroup badge updated if new terminal branch is available
• Updated haplotree design displays new SNPs and branches for your haplogroup
• Branch names now listed in shorthand using terminal SNPs
• For SNPs with more than one name, in most cases the original name for SNP was used, with synonymous SNPs listed when you click "More…"
• No longer using SNP names with .1, .2, .3 suffixes. Back-end programming will place SNP in correct haplogroup using available data.
• SNPs recommended for additional testing are pre-populated in the cart for your convenience. Just click to remove those you don’t want to test.
• SNPs recommended for additional testing are based on 37-marker haplogroup origins data where possible, 25- or 12-marker data where 37 markers weren't available.
• Once you've tested additional SNPs, that information will be used to automatically recommend additional SNPs for you if they’re available.
• If you remove those prepopulated SNPs from the cart, but want to re-add them, just refresh your page or close the page and return.
• Only one SNP per branch can be ordered at one time – synonymous SNPs can possibly [be] ordered from the Advanced Orders section on the Upgrade Order page.
• Tests taken have moved to the bottom of the haplogroup page.
• Group Administrator Pages will have longhand removed.
• At least one update to the tree to be released this year.
• Update will include: data from Big Y, relevant publications, other companies' tests from raw data.
• We'll set up a system for those who have tested with other big data companies to contribute their raw data file to future versions of the tree.
• We're committed to releasing at least one update per year.
• The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks. At that time, all Geno 2.0 participants’ results will be updated accordingly and accessible via the Genographic Project website.
Family Tree DNA created the 2014 Y-DNA Haplotree in partnership with the National Geographic Genographic Project using the proprietary GenoChip. Launched publicly in late 2012, the chip tests approximately 10,000 Y-DNA SNPs that had not, at the time, been phylogenetically classified.
The team used the first 50,000 male samples with the highest quality results to determine SNP positions. Using only tests with the highest possible “call rate” meant more available data, since those samples had the highest percentage of SNPs that produced results, or “calls.”
In some cases, SNPs that were on the 2010 Y-DNA Haplotree didn’t work well on the GenoChip, so the team used Sanger sequencing on anonymous samples to test those SNPs and to confirm ambiguous locations.
For example, if it wasn’t clear if a clade was a brother (parallel) clade, or a downstream clade, they tested for it.
The scope of the project did not include going farther than SNPs currently on the GenoChip in order to base the tree on the most data available at the time, with the cutoff for inclusion being about November of 2013.
Where data were clearly missing or underrepresented, the team curated additional data from the chip where it was available in later samples. For example, there were very few Haplogroup M samples in the original dataset of 50,000, so to ensure coverage, the team went through eligible Geno 2.0 samples submitted after November, 2013, to pull additional Haplogroup M data. That additional research was not necessary on, for example, the robust Haplogroup R dataset, for which they had a significant number of samples.
Family Tree DNA, again in partnership with the Genographic Project, is committed to releasing at least one update to the tree this year. The next iteration will be more comprehensive, including data from external sources such as known Sanger data, Big Y testing, and publications. If the team gets direct access to raw data from other large companies’ tests, then that information will be included as well. We are also committed to at least one update per year in the future.
Known SNPs will not intentionally be renamed. Their original names will be used since they represent the original discoverers of the SNP. If there are two names, one will be chosen to be displayed and the additional name will be available in the additional data, but the team is taking care not to make synonymous SNPs seems as if they are two separate SNPs. Some examples of that may exist initially, but as more SNPs are vetted, and as the team learns more, those examples will be removed.
In addition, positions or markers within STRs, as they are discovered, or large insertion/deletion events inside homopolymers, potentially may also be curated from additional data because the event cannot accurately be proven. A homopolymer is a sequence of identical bases, such as AAAAAAAAA or TTTTTTTTT. In such cases it’s impossible to tell which of the bases the insertion is, or if/where one was deleted. With technology such as Next Generation Sequencing, trying to get SNPs in regions such as STRs or homopolymers doesn’t make sense because we’re discovering non-ambiguous SNPs that define the same branches, so we can use the non-ambiguous SNPs instead. Some SNPs from the 2010 tree have been intentionally removed. In some cases, those were SNPs for which the team never saw a positive result, so while it may be a legitimate SNP, even haplogroup defining, it was outside of the current scope of the tree. In other cases, the SNP was found in so many locations that it could cause the orientation of the tree to be drawn in more than one way. If the SNP could legitimately be positioned in more than one haplogroup, the team deemed that SNP to not be haplogroup defining, but rather a high polymorphic location.
To that end, SNPs no longer have .1, .2, or .3 designations. For example, J-L147.1 is simply J-L147, and I-147.2 is simply I-147. Those SNPs are positioned in the same place, but back-end programming will assign the appropriate haplogroup using other available information such as additional SNPs tested or haplogroup origins listed. If other SNPs have been tested and can unambiguously prove the location of the multi-locus SNP for the sample, then that data is used. If not, matching haplogroup origin information is used.
We will also move to shorthand haplogroup designations exclusively. Since we’re committing to at least one iteration of the tree per year, using longhand that could change with each update would be too confusing. For example, Haplogroup O used to have three branches: O1, O2, and O3. A SNP was discovered that combined O1 and O2, so they became O1a and O1b.
There are over 1200 branches on the 2014 Y Haplogroup tree, as compared to about 400 on the 2010 tree. Those branches contain over 6200 SNPs, so we’ve chosen to display select SNPs as “active” with an adjacent “More” button to show the synonymous SNPs if you choose.
The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks. At that time, all Geno 2.0 participants’ results will be updated accordingly and will be accessible via the Genographic Project website.
Elliot Greenspan has provided the following quotes in conversation with Janine Cloud, Family Tree DNA's GAP Liaison and Events Co-ordinator:
"I want it to be the most accurate tree it can be, but I also want it to be interesting. That's the key. Historical relevance is what we're to discover. Anthropological relevance. It's not just who has the largest tree, it's who can make the most sense out of what you have [that] is important."
"This year we're committing to launching another tree. This tree will be more comprehensive, utilizing data from external sources: known Sanger data, as well as data such as Big Y, and if we have direct access to the raw data to make the proof (from large companies, such as the Chromo2) or a publication, or something of that nature. That is our intention that it be added into the data."
"We’re definitely committed to update at least once per year. Our intention is to use data from other sources, as well as any SNPs we can, but it must be well-vetted. NGS and SNP technology inherently has errors. You must curate for those errors otherwise you’re just putting slop out to customers. There are some SNPs that may bind to the X chromosome that you didn’t know. There are some low coverages that you didn’t know."
"With technology such as this [next-generation sequencing] you're able to overcome the urge to test only what you’re likely to be positive for, and instead use the shotgun method and test everything. This allows us to make the discovery that SNPs are not nearly as stable as we thought, and they have a larger potential use in that sense."
"Not only does the raw data need to be vetted but it needs to make sense. Using Geno 2.0, I only accepted samples that had the highest call rate, not just because it was the best quality but because it was the most data. I don't want to be looking at data where I'm missing potential information A, or I may become confused by potential information B. That is something that will bog us down. When you’re looking at large data sets, I’d much rather throw out 20% of them because they’re going to take 90% of the time than to do my best to get one extra SNP on the tree or one extra branch modified, that is not worth all of our time and effort. What is, is figuring out what the broader scope of people are, because that is how you break down origins. Figuring one single branch for one group of three people is not truly interesting until it's 50 people, because 50 people is a population. Three people may be a family unit. You have to have enough people to determine relevance. That's why using large datasets and using complete datasets are very, very important."
Update 27 April 2014
A recording of the Family Tree DNA webinar presented by Elise Friedman on the launch of the new 2014 haplotree is now available online and can be accessed here (free registration required).
Related blog posts
- A confusion of SNPs