For several years the genetic genealogy community has asked for adjustments to the matching thresholds in the Family Finder autosomal test. After months of research and testing, we will shortly be implementing some exciting changes.
The current matching thresholds – the minimum amount of shared DNA required for two people to show as a match are:
● Minimum longest block of at least 7.69 cM for 99% of testers, 5.5 cM for the other one percent
● Minimum 20 total shared centiMorgans
Some people believed those thresholds to be too restrictive, and through the years requested changes that would loosen those restrictions.
The following changes will be made to the matching programme.
● No minimum shared centiMorgans, but if the cM total is less than 20, at least one segment must be 9 cM or longer.
● If the longest block of shared DNA is greater than 9 cM, the match will show regardless of total shared cM or the number of matching segments.
The entire existing database will be rerun using the new matching criteria, and all new matches will be calculated with the new thresholds.
Most people will see only minor changes in their matches, mostly in the speculative range. They may lose some matches but gain others.
This is very welcome news. This was a change that many of us had asked for and it's good to know that Family Tree DNA have listened to us.
When setting a cut-off limit it is always difficult to get the balance right between false positive and false negative matches but the previous 20 cM threshold was problematic because all segments right down to 1 cM were included in the total. Family Tree DNA do not currently phase their data before assigning matches (sort the alleles into the maternal and paternal chromosomes) and we know that the vast majority of unphased small segments, particularly under 7 cMs, are false positives.(1) Some people were therefore declared as matches when most of the segments they shared were small pseudosegments, and they were unlikely to share a recent common ancestor. In contrast, some legitimate cousin matches were not showing up because they fell just below the threshold. Under the old system two cousins could potentially share a 15 cM segment but not have enough of the small pseudosegments to make up the 20 cM quota. Anecdotally it has been observed that the 20 cM threshold was a particular problem for people with African ancestry who tend to have fewer of these false coincidental matches on small segments.
When setting a cut-off limit it is always difficult to get the balance right between false positive and false negative matches but the previous 20 cM threshold was problematic because all segments right down to 1 cM were included in the total. Family Tree DNA do not currently phase their data before assigning matches (sort the alleles into the maternal and paternal chromosomes) and we know that the vast majority of unphased small segments, particularly under 7 cMs, are false positives.(1) Some people were therefore declared as matches when most of the segments they shared were small pseudosegments, and they were unlikely to share a recent common ancestor. In contrast, some legitimate cousin matches were not showing up because they fell just below the threshold. Under the old system two cousins could potentially share a 15 cM segment but not have enough of the small pseudosegments to make up the 20 cM quota. Anecdotally it has been observed that the 20 cM threshold was a particular problem for people with African ancestry who tend to have fewer of these false coincidental matches on small segments.
Some people were advocating for Family Tree DNA to set the threshold at 7 cMs, but the 9 cM threshold is a sensible compromise. There is still a high false positive rate for unphased 7-9 cM segments, so this will ensure that the reported matches are more likely to be real.
It should also be remembered that, in the vast majority of cases, if you match on a single segment under 10 cMs you will not share a common ancestor within the last ten generations. Even matches of 10 cMs can be very distant.(2) One study found that fewer than 35% of IBD (identical by descent) matches of 10 cMs fall within the last ten generations, and over 30% of segments of this size date back over 20 generations.(3)
I've also noticed in my own data that a lot of the segments in the 7 to 9 cM range seem to fall into large triangulated groups. If these segments are real then this is an indication that they are in what are known as pile up regions. These are regions of the genome where lots of people match because they share the same ethnicity or for some other reason rather than because they share a single recent common ancestor.
Indeed, because of the difficulties in working with unphased segments under 10 cMs many genetic genealogists recommend focusing only on matches who share 10 cMs or more.
I hope to do a comparison of my before and after matches at Family Tree DNA and will be interested to see comparisons from other people, but this is a very welcome and positive change. Thank you Family Tree DNA!
Update
It was not clear from the original announcement but it has now been confirmed that all matches with a total cM count of 20 cMs with a longest segment of 7.69 cMs or more in size will still be reported. Blaine Bettinger has provided a very useful decision tree to clarify the situation in his blog post Family Tree DNA updates matching thresholds. It therefore seems unlikely that many people will lose matches. Note that FTDNA does include all small segments right down to 1 cMs in their match thresholds. Most of these smaller segments, and especially those under 5 cMs are just noise and are best ignored unless you are able to do phasing and very careful chromosome mapping by testing a large number of close family members and known cousins.
Update 25 May 2016
I have received further information about the forthcoming update in an e-mail sent out by Family Tree DNA to all their volunteer group administrators. Here is the relevant section:
See also
It should also be remembered that, in the vast majority of cases, if you match on a single segment under 10 cMs you will not share a common ancestor within the last ten generations. Even matches of 10 cMs can be very distant.(2) One study found that fewer than 35% of IBD (identical by descent) matches of 10 cMs fall within the last ten generations, and over 30% of segments of this size date back over 20 generations.(3)
I've also noticed in my own data that a lot of the segments in the 7 to 9 cM range seem to fall into large triangulated groups. If these segments are real then this is an indication that they are in what are known as pile up regions. These are regions of the genome where lots of people match because they share the same ethnicity or for some other reason rather than because they share a single recent common ancestor.
Indeed, because of the difficulties in working with unphased segments under 10 cMs many genetic genealogists recommend focusing only on matches who share 10 cMs or more.
I hope to do a comparison of my before and after matches at Family Tree DNA and will be interested to see comparisons from other people, but this is a very welcome and positive change. Thank you Family Tree DNA!
Update
It was not clear from the original announcement but it has now been confirmed that all matches with a total cM count of 20 cMs with a longest segment of 7.69 cMs or more in size will still be reported. Blaine Bettinger has provided a very useful decision tree to clarify the situation in his blog post Family Tree DNA updates matching thresholds. It therefore seems unlikely that many people will lose matches. Note that FTDNA does include all small segments right down to 1 cMs in their match thresholds. Most of these smaller segments, and especially those under 5 cMs are just noise and are best ignored unless you are able to do phasing and very careful chromosome mapping by testing a large number of close family members and known cousins.
Update 25 May 2016
I have received further information about the forthcoming update in an e-mail sent out by Family Tree DNA to all their volunteer group administrators. Here is the relevant section:
We also slightly altered other proprietary portions of the matching algorithm that will, to a small degree, affect block sizes and total shared centiMorgans. These changes should have only marginal effects, if any, on relationships, generally in the distant to remote ranges.
There’s a separate proprietary formula that is also applied to those with Ashkenazi heritage, but you can, of course, expect to have more new matches than those not of Ashkenazi heritage.
Please keep in mind this change will not affect close matches, only distant and speculative ones. Some matches will fall off, others will be added. Most people will likely have a net gain of matches.
Your myOrigins results may change slightly with the rerun, but we have not updated or changed myOrigins yet. We’ll let you know when that happens.
See also
- More Family Finder matches by Dave Dowell
- FTDNA changes matching system by Judy Russell
- Family Finder matching thresholds changing at FTDNA by Roberta Estes
Footnotes
1. See the statistics on false positive matches on the ISOGG Wiki page on identical by descent.
2. See the blog post by Steve Mount on Genetic genealogy and the single segment. On Genetics, 19 February 2011.
2. See the blog post by Steve Mount on Genetic genealogy and the single segment. On Genetics, 19 February 2011.
3. See Figure 2 in the paper by Doug Speed and David Balding on Relatedness in the post-genomic era: is is still useful? Nature Reviews Genetics 2015 6: 33-44.
Your blog and Blaine's Chart are really good to read and to see.
ReplyDeleteThank you very much to BOTH of you.
Gail Riddell
Thank you Gail.
ReplyDelete