Friday 22 August 2014

clarifY DNA - a new Y-SNP analysis service

clarifY DNA is a new Y-DNA analysis service from Chris Morley, a well respected citizen scientist in the genetic genealogy community who is best known for his Geno 2.0 subclade predictor and his experimental Geno 2.0 trees. The methodology is outlined in his white paper "An experimental computer-generated Y-chromosomal phylogeny, leveraging public Geno 2.0 results and the current ISOGG tree". The new service is a natural development from the Geno 2.0 tool and allows users to receive a computer-generated phylogeny based on next-generation sequencing results. The service is currently restricted to an analysis of Big Y VCF/BED files, but there are plans to add the Full Genomes test (from a text file output), and the Chromo2 test from BritainsDNA in due course. The analysis currently costs $30 which includes the initial analysis and a subscription providing further updates at least until the end of 2014.

It is first of all necessary to register for an account. Once your payment has been approved and you've uploaded your files the automated report can be generated. The reports are manually checked before being uploaded to the website and I understand the turnaround is usually within 24 hours though is often much quicker. Once the report is ready you can download the PDF file from the phylogenetic reports menu.
Here is the tree generated from my dad's Big Y files.
The tree is very clear and easy to understand.  It builds on the good work of the ISOGG Y-SNP tree but also provides a more provisional perspective. clarifYDNA communicates which aspects are accepted, which aspects are provisional, and which aspects are most in need of further investigation. The tree is also a vast improvement on the current Family Tree DNA haplotree. The FTDNA tree was produced in partnership with the Genographic Project but the cut-off date was November 2013 and the tree does not include any of the new SNPs identified from testing with Big Y, Full Genomes and Chromo 2. The FTDNA tree still shows my dad's most downstream SNP as Z12 (a branch of R1b-U106), yet he had already tested positive for Z12 prior to taking the Big Y test.

According to the clarifY DNA analysis my dad has 18 private SNPs (all the SNPs highlighed in orange on line 14), which is the same number of private SNPs identified by the U106 project team. For genealogical purposes it is of course these private SNPs which are of the most interest and in the long term, as more people get tested, in theory we should be able to establish precisely where all these private SNPs are positioned on the tree and we will have the complete branching process of our Cruwys/Cruse/Cruise tree right down to the last few hundred years.

The report includes some of the technical details about how the algorithm works which I've reproduced here for reference:
The contents of this report were produced by a computer algorithm. This report will be frequently re-generated as more information becomes available. The pilot-scale implementation of this algorithm is able to process a dataset of over 4000 Big Y kits (over 400 real and 3600 simulated) in one run. 
clarifY DNA’s automation capabilities analyse large Y-SNP datasets with great speed, great accuracy and great comprehensiveness. These facets are critical for: helping a testing company’s customers make informed SNP-ordering decisions; uniting customers and/or research participants with their most meaningful patrilineal matches; and, overall, scientific progress, customer satisfaction and further growth. 
All in all, clarifY DNA’s software is the key to truly realising the “Y Tree” in “Family Tree”.
The phylogenetic algorithm employed here was initially developed in June 2013 for Geno 2.0 data; see for similar reports (from an earlier version of the phylogenetic algorithm) leveraging public Geno 2.0 data. While this report represents a large advance over existing Y-DNA trees, please treat some aspects of this report as experimental and preliminary; some enhancements specific to next-generation sequencing have not been exhaustively tested, and there are several discrepancies over the definitions of high-level SNPs.
The service also provides the option to contact your closest "genetic neighbours" on your branch of the Y-tree. You can opt to make your kit number and e-mail address available to your neighbours or you can choose to remain anonymous. If you opt not to reveal your email address, your matches can still send you a message, routed through, and it is then up to you to decide whether or not to reply (thereby revealing your email address).

All in all this looks like a very promising new service which provides cutting edge haplogroup analysis in a report which distils the pertinent information into an easy to understand phylogenetic tree. The value of the service will grow as more users contribute their data, and I understand that further enhancements are in the pipeline. clarifY DNA will be of particular benefit to people who have taken the Big Y test but who do not have the advantage of participating in a haplogroup project with administrators and team members who are actively involved in the interpretation and analysis of Big Y results. Even if you have received a detailed analysis from your project admins the service is worthwhile for the clarity of the presentation of the tree which helps to put your results in context.

Disclosure: I was given a complimentary analysis of my dad's Big Y data to enable me to write this review.


Family Sleuther said...

Thanks for sharing your review, Debbie. I found this helpful and will consider pursuing their analysis. I wonder if FTDNA has plans to update their haplotree (particularly with Big Y churning out new results)?

Debbie Kennett said...

Many thanks. I understood that FTDNA were supposed to be updating their tree later this year but whether or not that will happen is anyone's guess. I think their hands are somewhat tied because of their collaboration with the Genographic Project so in the meantime we have to rely on third-party services to get a full analysis.

Tiger Mike said...

Thank you. I like this new services' pricing. I'm trying to position this in mind versus FGC's analysis and Yfull's in particular. I think this is more akin to Yfull's tree. FGC doesn't really publish one that I can see. For people who don't understand nor want to understand all of the gobbledy-gook this looks a little simpler, but effective tree-wise. Other thoughts on comparing why this does or doesn't work well for those who simply want to know where they fit in the tree.?

Debbie Kennett said...

Mike, I think is more akin to the YFull service. YFull provide a more comprehensive analysis which includes Y-STRs and mtDNA. The clarifY tree has the advantage of showing where someone's results fit into the hierarchical structure of the tree. I think someone has taken the Big Y and wants a clear diagram showing how they fit into the tree and how many private SNPs they have this does the job. The FTDNA tree is so out of date as to be somewhat meaningless and because the matching is done using the out of date tree the matches are very misleading.

Anonymous said...

Does someone know if Chris Morley will continue to publish updates of the full Ytree analysis (including new Geno 2.0 results)?

Debbie Kennett said...

Chris has promised to do updates of the clarifY DNA results. You'll have to ask him about Geno 2.0 results. There's supposed to be a new chip launching some time this year, and not so many people seem to be taking the Geno 2.0 test now.