Tuesday 24 May 2016

New match thresholds for Family Tree DNA's Family Finder test

As a "trusted blogger" I have been given advanced notice by Family Tree DNA of forthcoming changes to the match thresholds for the Family Finder autosomal DNA test. The changes are to be rolled out very soon once the final quality control checks have been run. An e-mail will be sent out to project administrators in due course. Here are the details I received from Family Tree DNA:
For several years the genetic genealogy community has asked for adjustments to the matching thresholds in the Family Finder autosomal test. After months of research and testing, we will shortly be implementing some exciting changes.

The current matching thresholds – the minimum amount of shared DNA required for two people to show as a match are:

● Minimum longest block of at least 7.69 cM for 99% of testers, 5.5 cM for the other one percent

● Minimum 20 total shared centiMorgans 
Some people believed those thresholds to be too restrictive, and through the years requested changes that would loosen those restrictions.

The following changes will be made to the matching programme.

● No minimum shared centiMorgans, but if the cM total is less than 20, at least one segment must be 9 cM or longer.

● If the longest block of shared DNA is greater than 9 cM, the match will show regardless of total shared cM or the number of matching segments.

The entire existing database will be rerun using the new matching criteria, and all new matches will be calculated with the new thresholds.

Most people will see only minor changes in their matches, mostly in the speculative range. They may lose some matches but gain others.
This is very welcome news. This was a change that many of us had asked for and it's good to know that Family Tree DNA have listened to us.

When setting a cut-off limit it is always difficult to get the balance right between false positive and false negative matches but the previous 20 cM threshold was problematic because all segments right down to 1 cM were included in the total. Family Tree DNA do not currently phase their data before assigning matches (sort the alleles into the maternal and paternal chromosomes) and we know that the vast majority of unphased small segments, particularly under 7 cMs, are false positives.(1) Some people were therefore declared as matches when most of the segments they shared were small pseudosegments, and they were unlikely to share a recent common ancestor. In contrast, some legitimate cousin matches were not showing up because they fell just below the threshold. Under the old system two cousins could potentially share a 15 cM segment but not have enough of the small pseudosegments to make up the 20 cM quota. Anecdotally it has been observed that the 20 cM threshold was a particular problem for people with African ancestry who tend to have fewer of these false coincidental matches on small segments.

Some people were advocating for Family Tree DNA to set the threshold at 7 cMs, but the 9 cM threshold is a sensible compromise. There is still a high false positive rate for unphased 7-9 cM segments, so this will ensure that the reported matches are more likely to be real.

It should also be remembered that, in the vast majority of cases, if you match on a single segment under 10 cMs you will not share a common ancestor within the last ten generations. Even matches of 10 cMs can be very distant.(2) One study found that fewer than 35% of IBD (identical by descent) matches of 10 cMs fall within the last ten generations, and over 30% of segments of this size date back over 20 generations.(3)

I've also noticed in my own data that a lot of the segments in the 7 to 9 cM range seem to fall into large triangulated groups. If these segments are real then this is an indication that they are in what are known as pile up regions. These are regions of the genome where lots of people match because they share the same ethnicity or for some other reason rather than because they share a single recent common ancestor.

Indeed, because of the difficulties in working with unphased segments under 10 cMs many genetic genealogists recommend focusing only on matches who share 10 cMs or more.

I hope to do a comparison of my before and after matches at Family Tree DNA and will be interested to see comparisons from other people, but this is a very welcome and positive change. Thank you Family Tree DNA!

Update
It was not clear from the original announcement but it has now been confirmed that all matches with a total cM count of 20 cMs with a longest segment of 7.69 cMs or more in size will still be reported. Blaine Bettinger has provided a very useful decision tree to clarify the situation in his blog post Family Tree DNA updates matching thresholds. It therefore seems unlikely that many people will lose matches. Note that FTDNA does include all small segments right down to 1 cMs in their match thresholds. Most of these smaller segments, and especially those under 5 cMs are just noise and are best ignored unless you are able to do phasing and very careful chromosome mapping by testing a large number of close family members and known cousins.

Update 25 May 2016
I have received further information about the forthcoming update in an e-mail sent out by Family Tree DNA to all their volunteer group administrators. Here is the relevant section:
We also slightly altered other proprietary portions of the matching algorithm that will, to a small degree, affect block sizes and total shared centiMorgans. These changes should have only marginal effects, if any, on relationships, generally in the distant to remote ranges. 
There’s a separate proprietary formula that is also applied to those with Ashkenazi heritage, but you can, of course, expect to have more new matches than those not of Ashkenazi heritage. 
Please keep in mind this change will not affect close matches, only distant and speculative ones. Some matches will fall off, others will be added. Most people will likely have a net gain of matches. 
Your myOrigins results may change slightly with the rerun, but we have not updated or changed myOrigins yet. We’ll let you know when that happens.

See also

Footnotes
1. See the statistics on false positive matches on the ISOGG Wiki page on identical by descent.
2. See the blog post by Steve Mount on Genetic genealogy and the single segmentOn Genetics, 19 February 2011.
3. See Figure 2 in the paper by Doug Speed and David Balding on Relatedness in the post-genomic era: is is still useful? Nature Reviews Genetics 2015 6: 33-44. 

Tuesday 17 May 2016

AncestryDNA are to use a new chip for their autosomal DNA test

AncestryDNA have announced that with effect from this week they will be using a new chip for their autosomal DNA test. The announcement was made in a blog post by the Ancestry team Customer testing begins on new AncestryDNA chip published on 12th May.

Some genetic genealogists in the US were invited to attend a conference call with the AncestryDNA team where they were given the chance to ask questions about the changes. For further details read the following two articles:
I will update this list if any further articles are published.

Monday 16 May 2016

Rebranding of BritainsDNA and ScotlandsDNA as MyDNA Global

There appear to be changes afoot at BritainsDNA and ScotlandsDNA following their acquisition by Source BioScience in December 2015.

If you visit the websites of BritainsDNA, ScotlandsDNA and the other associated websites (IrelandsDNA, CymruDNAWales, YorkshiresDNA, IzzardsDNA) you are now greeted with an error message. Here's the message as it appears in Google Chrome after clicking on the Advanced button. 
Here's the message as it appears in Firefox.
If you disregard the warnings and proceed to the websites then they are still functioning as normal. The security certificate mentions a company called MyDNA Global. I took a look at their website and it is a direct copy of the BritainsDNA family of websites but with a new name.


It therefore looks as though Source BioScience, the new owners, are in the process of rebranding the various BritainsDNA websites. There is also a newly created MyDNA Global Facebook page and Twitter account.

There is no change in the product offerings but the websites have been pared down. There is no longer a list of employees and the events page has disappeared. The legal verbiage on all the sites now appears in the name of MyDNA Global.

I've spoken to a couple of people who have tested at BritainsDNA but so far no one has had any communication from the company informing them of the changes.

Source BioScience have now published their annual report for the year ending on 31 December 2015. which provides further information about the acquisition. Here is a quote from page 17:
In December the Group acquired BritainsDNA, a provider of DNA-based ancestry and genealogy products to the consumer market. Source BioScience has been providing the laboratory testing and analysis for BritainsDNA for a number of years. The acquisition will deliver incremental revenue and increased operational efficiency for this business. The commercial activities will be migrated across to the Group’s e-commerce platform and e-Shop early in 2016.
The financial details of the acquisition are given on page 77. It would appear that the Moffat Partnership, the parent company of BritainsDNA (now renamed as Source BioScience Scotland) had considerable financial liabilities amounting to £570,000 but little in the way of tangible and intangible assets. These liabilities appear to have been offset by a goodwill valuation of £584,000.

There have not been any announcements from Source BioScience about their future plans for BritainsDNA/MyDNA Global so it will be interesting to see what happens in the coming months. If anyone has any further information do let me know.

Friday 6 May 2016

AncestryDNA's updated matching algorithms - a before and after analysis

AncestryDNA rolled out their long-awaited new matching algorithms on Tuesday this week. This message will now greet you when you log into your AncestryDNA account.
Ancestry have provided a number of resources to describe the changes, all of which merit a close reading:
AncestryDNA have been able to make these improvements because they have such a massive autosomal DNA database. They have now tested nearly two million people. Their scientists have been able to exploit the power of this large database to provide new insights into relatedness and to improve the detection of genealogically relevant IBD segments.

The biggest change is an improvement in the phasing process, Phasing is the process of sorting out the DNA letters   the As, Cs, Ts and Gs  – and placing them on the maternal and paternal chromosomes. Phasing is important for ruling out false positive and false negative matches. AncestryDNA are now using a reference panel of more than 300,000 genotypes for their phasing. Previously they were using a "window" system for IBD detection which broke the large segments into too many small pieces. Now they are using a SNP-based system which provides more realistic results with fewer segments. Phasing can be done with reference panels with a high degree of accuracy  the error rate of Ancestry's Underdog phasing engine is less than 1%. The accuracy will increase as the reference panel grows in size.

The matching threshold has also been changed. Two people must now share a minimum of 6 cMs whereas the old threshold was 5 cMs. AncestryDNA have produced a revised table of confidence scores based on a new understanding of the amount of DNA shared between different relations.


Contrast the above scores with the old version of the chart below which, to my mind, was always overly optimistic, especially about the matches on segments under 20 cMs, the vast majority of which are actually shared with very distant cousins. (For more on this subject watch Dr Doug Speed's lecture Who's your cousin? Using DNA to determine relatedness which he presented at  Who Do You Think You Are? Live this year.)


Comparing matches before and after
I thought it would be an interesting exercise to compare my matches before and after the update. Unlike Family Tree DNA and 23andMe, AncestryDNA do not provide a facility for customers to download their match list. Fortunately Rob Warthen from DNAGedcom has provided a tool known as the DNAGedcom Client, which allows us to download all our data from Ancestry, including details of the shared cM count and the number of shared segments. I downloaded my list of matches on 19th April. I ran the DNAGedcom Client again on 4th May, and I've compared the two datasets to see how many matches I've gained and lost.

Here is a comparison of the number of matches I had before and after the update:

DateMatches4th cousinsDistant cousinsShaky leaf hintsCirclesNADs
4 May3423183405100
19 April3414283386100

There was a marginal increase in the number of matches, but a close analysis of these matches provides a different perspective. I actually lost 1169 (34%) of my matches. However, this is more than made up for by the fact that I have gained 1178 new matches.

This is a breakdown of the size of the segments I share with my matches before and after the update:

Date< 6 cMs6-6.99 cMs7-9.9 cMs10-10.9 cMs>15 cMsMatches
4 May015181456381683423
19 April1737704719214403414

I thought it would be interesting to do a further breakdown of my matches who were predicted to be fourth cousins or closer. Note that what AncestryDNA describe as a fourth cousin can in fact be anything from a fourth to a sixth cousin.

Relationship beforeRelationship aftercMs beforecMs afterSegments beforeSegments after
3rd cousin3rd cousin109.71117.19785
3rd cousin3rd cousin84.35898.316944
4th cousin4th cousin52.36161.044543
4th cousin4th cousin23.94730.443543
4th cousin4th cousin23.69529.488911
4th cousin4th cousin21.77627.269111
4th cousin4th cousin24.09825.291522
4th cousin4th cousin22.32524.425911
Distant cousin4th cousin11.36423.877132
4th cousin4th cousin20.06523.863411
4th cousin4th cousin20.70423.197522
4th cousin4th cousin17.15922.907622
Distant cousin4th cousin13.20422.896221
Distant cousin4th cousin13.65322.273432
4th cousin4th cousin18.58121.195611
4th cousin4th cousin22.60420.964411
4th cousin4th cousin20.89620.848321
4th cousin4th cousin18.03320.650611
4th cousinDistant cousin24.06813.054511
4th cousinDistant cousin19.14919.370221
4th cousinDistant cousin18.62918.279611
4th cousinDistant cousin18.47418.610711
4th cousinNone24.5381
4th cousinNone23.7311
4th cousinNone23.0991
4th cousinNone21.9781
4th cousinNone21.2851
4th cousinNone21.0931
4th cousinNone20.8781
4th cousinNone20.1721
4th cousinNone19.9371
4th cousinNone18.0711

As can be seen, for the matches that have been retained there has been a marginal increase in the cM count. Four matches have been downgraded from fourth cousins to distant cousins. Ten of my previous fourth cousins (35%) have disappeared from my match list completely. It may be that these matches were filtered out because of the improved phasing. Another possibility is that these segments were in SNP-poor regions. Ancestry explain in their white paper that matches in these regions are unreliable. To counteract this problem they "discount these matches by reducing their total length (in cM)". These matches are no great loss. All these fourth cousins were in America and it was impossible to find any sort of genealogical relationship despite the fact that some of these matches had huge and very detailed trees. I'd rather suspected that these matches must be very distant, if they were legitimate at all. I already have far more matches than I know what to do with and I can still only find the genealogical connection with two of my matches at AncestryDNA. I would much rather have fewer and more accurate matches.

Conclusion
It's important to remember that we are all pioneers in this field, the tests are in their infancy and we still have much to learn.

At Family Tree DNA, 23andMe and GedMatch we are used to working with unphased data which can produce false positive matches, particularly on smaller segments under 15 cMs. Ancestry are the only company who filter out the high-frequency matches which are not of genealogical relevance, though 23andMe do screen out some matches in known problem areas. IBD segments with high rates of matching are likely to be less useful for detecting relationships in a recent genealogical timeframe.

Without phasing and without frequency filters it is much easier for people to find false coincidental matches, but we all need to be very careful about jumping to conclusions, especially with more distant relationships, where it is so much more difficult to detect recent IBD with the currently available tests.

This is the second time that AncestryDNA have updated their algorithms. Family Tree DNA have already changed their algorithms once, which resulted in some lost matches. We should all expect to see further changes to the companies' matching algorithms in the future as they strive to improve the technology and produce more accurate results.

Further reading
Thumbs up; AncestryDNA improves genetic matching technology - a review by Diahan Southard, 9 May 2016.

Acknowledgements
Thanks to Don Worth in the ISOGG Facebook group for sharing his Excel formula for calculating the number of lost matches.

© 2016 Debbie Kennett