Thursday, 20 November 2014

Improved cousin matching at AncestryDNA

Yesterday afternoon AncestryDNA rolled out their much anticipated new matching algorithms. When I wrote last week about AncestryDNA at Back To Our Past in Ireland I reckoned that I had about 7400 matches (148 pages of matches at 50 pages per match). Now when I check into my AncestryDNA account I find that I have a much more manageable 1100 matches (22 pages of matches). This represents an 85% reduction in the number of matches. Previously I had matches with 20 people who were predicted to be fourth to sixth cousins. Now I have just seven matches with fourth to sixth cousins. The remainder of my matches are predicted to be fifth to distant cousins. Thirty-three of these distant cousins are described as high confidence matches. The remainder are shown as good confidence matches. Here is a screenshot of my AncestryDNA home page.

As I side note I find that I can no longer access AncestryDNA from my Ancestry.co.uk subscriber account. When I click on the DNA button I am taken to the usual page which advises me that AncestryDNA is not yet available outside the United States.

I used to be able to click on the orange DNA.ancestry.com button to view my DNA results. Now when I click on this button I get redirected to the URL http://dna.ancestry.com/interim but the page doesn't load. After much frustration yesterday I eventually discovered that I could access my results by going direct to http://dna.ancestry.com. It may be that these changes have been made to prepare the website for the launch in the UK and Ireland. (The AncestryDNA test is currently only available in the US, though a few non-Americans like me managed to place an order in the early beta-testing days before shipping outside the US was stopped. It is also possible to order a kit by using one of the package forwarder services.)

Ancestry have done a good job of providing some FAQs to explain how the new matching process works in easy-to-understand language.

They have also provided a very informative table showing the explanation of the different confidence levels. For the first time they have given us the all-important information on the range of shared centiMorgans for each of the confidence scores.

In addition there is a very detailed technical white paper which explains the methodology behind the new phasing and matching algorithms. Phasing is the process of determining which DNA you inherited from your mother and which DNA inherited from your father. There is a more detailed explanation in the ISOGG Wiki. If you have tested yourself and your parents then you can do your own phasing but currently none of the testing companies use phased trio data. The Ancestry approach involves inferring the inheritance from reference populations. It is a computationally intensive exercise and Ancestry are currently the only company who phase their customers' data before doing the matching. The new matching algorithms are based not just on the amount of DNA shared but also on the frequency of the segments in the database. Ancestry have found that there are some segments of DNA that are shared by large numbers of people and these segments are likely to be indicative of ancient ancestry rather than the sharing of a recent ancestor in a genealogical timeframe. Because 99.% of the AncestryDNA database is in America I've never been able to do anything with my matches, but the new algorithms certainly seem to be a very useful improvement, and I hope that I will reap the benefits when Ancestry start to sell their test in the UK.

The old matches have not yet been lost completely and it is now possible to download a spreadsheet with a list of all your Version 1 matches. This facility will be available for a limited time. The spreadsheet lists the Ancestry user names of your matches, the user names of the admins of the accounts, and the predicted relationship range. There are additional columns labelled "Starred", "Viewed", "Hint" "Archived" and "Note". The spreadsheet is available via the settings menu with the little gear icon on your AncestryDNA home page. When I downloaded the spreadsheet I found that I actually had 9449 matches. I'm not quite sure why there were so many more matches in the spreadsheet than I'd previously estimated by extrapolating from the number of pages in my match list. However, this new figure actually means that I've seen an 88% reduction in the number of my matches. It would be helpful if Ancestry could let us have the ability to download a spreadsheet with a list of all our new matches as well, and preferably with additional details such as the matching surnames. Both 23andMe and Family Tree DNA allow us to download a list of our matches. It is much easier to scan through a spreadsheet rather than clicking on each individual match page.

DNA Circles
The other new feature that has been introduced with this update is called DNA Circles. I'm not yet in any DNA Circles so I can't explore how this feature works. However, Roberta Estes, Blaine Bettinger, Judy Russell and Diahan Southard have all written about the DNA Circles and I suggest you read their articles to find out more:

Ancestry's better mousetrap - DNA circles by Roberta Estes
- Goodbye false positives! AncestryDNA updates its matching algorithm by Blaine Bettinger
- Changes at AncestryDNA by Judy Russell
AncestryDNA Review and Breaking News! Updates Launched by Diahan Southard

There is also a post on the Ancestry blog with a description of the new feature:

New AncestryDNA technology powers new kinds of discoveries

Ancestry have prepared some FAQs about the DNA Circles and have published a detailed white paper explaining the methodology. The white paper is a very interesting read but it is currently very hard to find if you are not included in any DNA circles. To get to the white paper go to your matches page (not your DNA home page) and click on the question mark. Click on the tile with the magnifying glass labelled "What can I do with my DNA matches". Scroll down to the paragraph headed "Find DNA evidence for your genealogical research". At the end of that paragraph click on the green underlined lettering "Learn more about DNA Circles".  That takes you to a page explaining how the DNA Circles are created. At the very bottom of the page there is a green link labelled "Check out our DNA Circles White Paper". I hope that Ancestry will make the paper easier to find so that more people will be encouraged to read it. Note that both the AncestryDNA white papers can only be accessed by Ancestry subscribers. (Thanks to Ann Turner on the ISOGG list for alerting me to this workround which was first posted in the Ancestry forums by Laura Davenport.)

It will be interesting to see how these circles work when AncestryDNA start to roll out their test in the UK and Ireland. I can see that the feature will work well for American genealogists. This is because a large percentage of the Ancestry subscriber base in the US appears to have deep roots in Colonial America, and they all trace back to a small founding population. Consequently Americans will often be related to each other on multiple ancestral pathways in the last three or four hundred years which greatly increases the chances of them sharing DNA segments and finding connections. It also seems to be the case that family sizes in America in historical times were much larger than they were in the UK, which means that the gateway couples in America can often have literally thousands of living descendants. To put this problem into perspective it is cautionary to remember that the population of America in 1700 was just over 250,888, whereas the population of England in 1700 was over six million. It was not until some time after 1851 that the population of the US exceeded that of Great Britain and Ireland. We therefore have a much smaller population of living people who are tracing their ancestry back to a much larger population pool. Nevertheless I shall be very interested to see how this feature works when the AncestryDNA test finally becomes available over here.

Footnotes
1. The Gendocs website has a useful page on population statistics in the UK and Ireland over time:
http://homepage.ntlworld.com/hitch/gendocs/pop.html
2. For statistics on the US population see the Wikipedia article on the demographic history of the United States: http://en.wikipedia.org/wiki/Demographic history of the United States

© 2014 Debbie Kennett

13 comments:

GizaCat said...

I am an American with roots deep in early Colonial VA, MD, and New England. I am really looking forward to seeing what comes of AncestryDNA tests being made available not only to the UK, but to Europe as well. I hope the technology has improved enough for your American cousins to connect with the descendants who stayed in the British Isles.

I have recent German immigrant ancestors and research shows heavy concentrations of my German surnames still in the same areas they were in back between 1845 - 1868. I just wish Germans were as curious about us as their American cousins are about them.

Debbie Kennett said...

It will be interesting to see what happens when the AncestryDNA test rolls out to other countries though the problem is that the DNA can only tell us so much. We are still reliant on the paper trails to find the connections, and unfortunately it's rare to find documentary records in early Colonial America that state where a person came from in the UK. It would help if Ancestry provided a segment triangulation facility so that you could search for common surnames on shared segments.

Stan said...

There is nothing 'improved' about it. Compared to what ancestry could be giving us, it is nothing more than a slightly improved 'shaky leaf' feature. And most of my 'false positives' I lost were not false at all. Ancestry should be giving us the broadest possible opportunity to associate our DNA with other people and they have done just the opposite.

Debbie Kennett said...

Stan

I'd much rather have fewer matches than a long list of false positive matches. The matches you lost were presumably distant cousins who appeared in your pedigree but were not related to you through their DNA. Ancestry have introduced some innovative new tools, and have shown commendable transparency by publishing two very detailed white papers. I know they don't offer segment triangulation but no doubt there will be more features to come. The biggest problem with Ancestry is that they are currently only selling their test in the US so it's not possible for me to associate my DNA with any relatives.

GizaCat said...

I am doing an experiment with my matches at GEDMatch, the wonderfully international site for DNA based genealogy. I have two paper trail/DNA matched cousins one at a 4.0 level and another one at 6.1 level. Now that I know how to eliminate false positives I did one-to-one comparisons with several other kits (one a mid-level match at Ancestry), including some "blind" kits from 23&Me and FTDNA with no paper trails. Using the two other kits from known DNA ancestors I'm now learning some of these other people are also share common ancestors with me. It's a kind of a triangulation without false positives. Is what I'm doing making any sense to more experienced DNA Genealogy sleuths?

Debbie Kennett said...

Giza

What you need to do is work out if people match you on the same segment. See the links on the ISOGG Wiki page on triangulation:

http://www.isogg.org/wiki/Triangulation

There are people from all over the world on GedMatch but it appears to me that the database is about 90% or more American so it's not so good for finding connections for those us who aren't American and don't have cousins who emigrated to America in the last couple of hundred years.

Unknown said...

Debbie, I am a Brit and I moved here just a few years ago. I am getting hundreds of hits with Americans - descendants of ancestors who must have settled here – I have some ideas, but I cannot yet verify how for sure (so many people just do not post trees to their DNA tests, or the UK ancestor labeling is vague/general). These bifurcated family lines are proving really tricky. All I can tell is that there are two periods and two approximate locations in America where a lot of activity seems to have been initiated. I am a newbie, so I am still learning. On FamilyTreeDNA I have tens of 2nd to 4th American cousins. On AncestryDNA.com, I have about the equivalent. I am having no luck in finding the connections, despite the very high, high confidence levels (post new algorithm). I did however lose a more recent Canadian 1st cousin 3 times removed when the November switch happened who was a very confident match before. Fortunately, I did the leg work to find them again. It makes me wonder how many others I might have lost. I know you would prefer the culled list than the long list of false positives, but the losses can be quite significant in the downsizing.

Debbie Kennett said...

I think many of these predicted second to fourth cousins at FTDNA are more towards the fourth cousin end of the range. If they were really that close I would have thought it should be possible to find the connection. I suspect we're picking up matches with people whose ancestors emigrated back in the 1600s. The early colonists all married each other and as a result of this pedigree collapse I think their descendants share larger than average segments. You see the same phenomenon in Ashkenazi Jews though to a much greater extent.

I would prefer that Ancestry would publish the details of the pile-up segments, their locations and sizes, and the number of people matching on these segments so that we can judge for ourselves whether or not real matches have been removed along with all the false positives. I think there is always a going to be a delicate balance when trying to filter out the false positive matches without throwing out all the true matches as well. Fortunately we can upload our results to GedMatch to check these matches out for ourselves wherever possible.

Unknown said...

Debbie, thanks so much for posting. Phew, do I need a sanity check. I only began this in November and I am having trouble believing what I am seeing at times. The "build up", as you called it, is something I was intuiting by imagining what life must have been like in the early days. Actually, one of the periods I am seeing activity is between 1550s and 1600s. Two possible original ancestors, and then a lots of activity. The other a period of emigration from the UK seems to begin from the early 1800s. I see patterns in names that keep cropping up, but I cannot put my finger on specifics. I have been using chromosome browser and Gedmatch and have note pads filled with notes and musings. I wonder how the 1st cousin 3X was culled however, since that is a close relative. I think my bad luck is missing paper trails, and not willing to hedge bets, and of course being green at the subject area. Would you recommend working with a genealogist to help get things kick started? Cheers. Jeremy

Debbie Kennett said...

I'm not sure that it will ever be possible to find the genealogical connections with the majority of our matches who are predicted to be fifth to distant cousins. I can't even find the connections with any of the other Brits that I match. In any case many of those distant matches are likely to be false positives. You might want to have a read of the ISOGG Wiki page on IBD to understand the problem:

http://www.isogg.org/wiki/Identical_by_descent

You might want to join the ISOGG DNA Newbie group. It's quite a busy group but there's a lot of talk about methodology:

https://groups.yahoo.com/neo/groups/DNA-NEWBIE/info

Some people are trying to put their matches into triangulated groups (people who all match each other on the same segment on the same chromosome).

There's a also a wonderful tool called the autosomal DNA segment analyser which is available on the DNAGedom website. There are blog posts here explaining how it works:

http://www.isogg.org/wiki/Autosomal_DNA_tools

It essentially does the triangulation process for you. However, I've found so far that even identifying triangulated groups doesn't help because the people in the groups don't have genealogies traced back to the UK and don't have any surnames in common. I've got one triangulated group that seems to be all Irish but it predates 1800 and it's very difficult in general to do Irish research before this date.

What is really needed is more people testing from the British Isles so that we have more chance of finding genetic cousins with whom we have a reasonable chance of identifying the common ancestor.

Unknown said...

Thanks SO much again. I have managed, along with a common descendant in the UK to trace back one direct line to b. 1498 over there with a very tight paper trail (with our common 3X great grand parents and backwards picked up en route). Then I picked up many DNA matches in the US with descendants of that same direct UK line (my gggfX12) who settled in the Colonies leaving half his kids in the UK (my gggfX11). The frequency of the DNA matches I am getting with his direct descendants over here in the US (both on FTDNA and Ancestry) is incredible to be put to sleep as coincidence/false – and the pattern of the same people through time to current time in the US seems to be forming with regularity. I'm beginning to second guess where a surname will be located in the colonies, and anticipate the names of the parents and children, etc. I understand the issues, but I hope that it is more than wishful thinking that when you get these multiple repetitions they become plausible. Might I have just lucked out on this one because of the good UK side paper trail? Followed by the Ancestor being one of a very small population way back when?
Interesting point you have about wishing more UK folks would test. Having grown up in the UK and spending many years living in Europe (Germany/Eastern Europe) when I was younger, I believe that DNA testing has a different ring to it culturally and psychologically with older folks after the experience of World War II (compared to over here). It might take a bit more trailblazing. Again, thanks so much for the insight and great direction. I'll dive in on the reading asap. Jer

Unknown said...

Thanks SO much again. I have managed, along with a common descendant in the UK to trace back one direct line to b. 1498 over there with a very tight paper trail (with our common 3X great grand parents and backwards picked up en route). Then I picked up many DNA matches in the US with descendants of that same direct UK line (my gggfX12) who settled in the Colonies leaving half his kids in the UK (my gggfX11). The frequency of the DNA matches I am getting with his direct descendants over here in the US (both on FTDNA and Ancestry) is incredible to be put to sleep as coincidence/false – and the pattern of the same people through time to current time in the US seems to be forming with regularity. I'm beginning to second guess where a surname will be located in the colonies, and anticipate the names of the parents and children, etc. I understand the issues, but I hope that it is more than wishful thinking that when you get these multiple repetitions they become plausible. Might I have just lucked out on this one because of the good UK side paper trail? Followed by the Ancestor being one of a very small population way back when?
Interesting point you have about wishing more UK folks would test. Having grown up in the UK and spending many years living in Europe (Germany/Eastern Europe) when I was younger, I believe that DNA testing has a different ring to it culturally and psychologically with older folks after the experience of World War II (compared to over here). It might take a bit more trailblazing. Again, thanks so much for the insight and great direction. I'll dive in on the reading asap. Jer

Debbie Kennett said...

The difficulty with autosomal DNA is that you cannot determine whether a segment has come from a specific ancestor without doing detailed chromosome mapping. This is a costly and time-consuming process which involves testing lots of first, second and third cousins so that you can identify precisely where the segments have originated. See this article for more information on chromosome mapping:

http://www.isogg.org/wiki/Chromosome_mapping

Some lines are better documented than others and it could well be that with the bottleneck virtually everyone will be descended in one way or another from the one couple you've identified that you share in common but the segments that you share could have come from different ancestors. I would suggest you keep an open mind at the moment.