Tuesday 1 September 2020

The AncestryDNA matching updates have now been completed

I wrote back in mid July that AncestryDNA would be updating their matching algorithms to provide information on the length of the longest segment and a more accurate tally of the number of matching segments. AncestryDNA also announced that they would no longer be reporting matches that shared a total of 8 cM or less after the application of the Timber algorithm. These changes were rolled out gradually in August with the small matches finally disappearing shortly before midnight last night UK time.

I made a note yesterday afternoon before the small matches disappeared of the number of matches at AncestryDNA for me, my mum and my dad. I've done a before and after comparison along with a comparison of the number of matches, where available, at the other testing companies. The number of 4th cousin or closer matches at AncestryDNA remains unchanged.

I've lost 66% of my matches at AncestryDNA but in reality this is no great loss as so many of these small matches are false matches which don't match either of my parents. Even when the person does match one of my parents I often find that the documentary link is on the wrong side, for example, the person has a DNA match with my dad but I've identified a genealogical link on my mum's side. Even if these small matches are valid, they are far more likely to trace back 10, 20 or 30 generations rather than fall within a useful genealogical timeframe. There are currently no tools which can determine the age of a single segment match and tell us whether we are matching a fifth cousin rather than a twentieth cousin. It's impossible to work with such small DNA matches when probably 95% or more of them are either false matches or very old matches. With whole genome sequencing we will probably have the ability to make these distinctions but that is currently a long way off.  

AncestryDNA previously set a much lower threshold for matching than 23andMe, FamilyTreeDNA and MyHeritage so this update now brings them more into line with the other companies. Ancestry have by far the largest database with over 18 million people tested so it's not surprising that my family have far more matches there than at any other company even after the purge. I was surprised to find that my transfer kit at MyHeritage had nearly 2000 more matches than the test I did directly with the company when I ordered their Health and Ancestry test. 23andMe restrict the number of matches to 2000 and this total includes people in the database who have not opted in to relative matching. However, they have just launched a new invite-only subscription Premium Membership which will provide new health reports as well as additional ancestry features such as the ability to view four times more DNA Relatives. For details see this page on the 23andMe website though you will need to be logged into your 23andMe account to view the page. If this trial is successful we may well see other companies offering access to a more extensive match list for a fee though I suspect that for the vast majority of AncestryDNA users a list of 10,000 or more matches is more than they can realistically handle.  

I've not yet had much time to look at the new information about the number of matching segments and the length of the longest segment. However, if a match only shares a single segment we can now get an idea of how Ancestry's Timber algorithm works because it is applied after the longest segment has been identified. Timber has the effect of downweighting regions where there are large numbers of matches. Matches are only likely to be genealogically relevant if they fall in a region which is shared with just a few cousins in your family rather than being shared with large numbers of people in the general population. Timber is only applied to matches sharing 90 cM or less. For full details see the updated AncestryDNA Matching White Paper.

Unless you're from an endogamous population you'll probably find that Timber has had little or no effect on most of your matches. For my match below, the longest segment size is identical to the total cM shared.
In other cases I am finding minor discrepancies in the matches, sometimes of just one or two cM or, as in the case below, a small reduction of just 0.1 cM.
However, I have found one single segment match where there was a sizeable discrepancy.
This match lives in Canada and has ancestry from Scotland, Wales, Norway and Newfoundland. I can see that the match is on my maternal side. For my mum the match has been similarly reduced in size from 56 cM to 38 cM. My mum has no known ancestry from Scotland, Wales or Norway and I'm not aware of any maternal ancestors or relatives who emigrated to Newfoundland. It seems unlikely that I will be able to document a connection and the fact that the match has been so drastically reduced is probably a red flag that this match should be treated with caution. I've clicked through to look at quite a few more matches but have not found any others with quite such a big discrepancy though I've found a few matches where there is a difference of 5 or 10 cM.

It will be interesting to see if people are able to make use of the longest segment data. I think it might be helpful, as in the example above, in highlighting matches that appear to fall into problem areas and which are likely to be less useful for genealogical purposes. See for example this very interesting blog post from Kalani Mondoy where he has shown how useful the longest segment data has been for him to distinguish between his genuine Hawaiian matches and the very distant Maori matches which are indicative of shared ancestry from about a thousand years ago before the two populations split.

I would hope that AncestryDNA will eventually be able to use the longest segment data to refine the matches for people with ancestry from endogamous populations. I have access to a British Ashkenazi Jewish account at Ancestry where the individual previously had 224,377 matches. After the match reduction there are still 169,928 matches remaining. There is clearly great scope for improving the matching for these populations.

The reduction in matches at AncestryDNA proved surprisingly controversial with some people, particularly those of African American heritage, arguing passionately for their retention. See, for example, this blog post by Fonte Felipe. However, the reduction has taken place and the decision is not likely to be reversed. We need to focus on what we can do and not what we can't do. How are you making use of the new segment data and the information about the longest segment? What tips do you have for making the most of your AncestryDNA matches? Do let me know what you think.

Update 3rd September 2020
AncestryDNA confirmed today in a conference call that they will soon be showing us the unweighted pre-Timber total cM shared. This will allow us to see at first hand how much of an effect, if any, the Timber algorithm is having on our matches. No exact date has been promised but the information is expected to be added to our accounts in the next two weeks or so.

Further reading