Tuesday, 19 April 2016

Changes to the AncestryDNA matching algorithms and downloading your AncestryDNA matches with the DNAGedcom Client

AncestryDNA announced recently that they will shortly be updating their matching algorithms thanks to new advances in DNA science. Yesterday some of the genetic genealogy bloggers in the US attended a conference call with AncestryDNA and were given a preview of the changes. Blaine Bettinger has provided a detailed overview in his blog post entitled Ancestry DNA plans update to matching algorithms. The new algorithms should provide a more accurate list of matches with fewer false positives and false negatives.

I hope to be able to do a comparison of my matches before and after the changes. Here is my match page as it currently stands.

I have 69 pages of matches which, at 50 matches per page, is around 3450 matches. Of these, 28 are fourth cousins. I have one shaky leaf hint. I don't have any New Ancestor Discoveries and I'm not yet in any DNA Circles.

As AncestryDNA do not provide the facility to download your list of matches, I have used the DNAGedcom Client. DNAGedcom is a free autosomal DNA tool provided by Rob Warthen, and provides a number of utilities which are helpful for anyone interested in doing more detailed analyses of their autosomal DNA results from the three major providers (23andMe, AncestryDNA and Family Tree DNA). One of the most popular tools hosted on the DNAGedcom website is Don Worth's Autosomal DNA Segment Analyzer. This allows you to get a visual representation of your matches on a chromosome by chromosome basis. Sue Griffith has written a detailed review in her article Autosomal DNA Segment Analyzer (ADSA): no spreadsheets required!

The DNAGedcom Client is an add-on facility which is available by subscription for a very reasonable charge. I've just paid for one month's access which cost me US $5.00. I was able to pay by PayPal and the cost worked out at £3.61. If you prefer you can take out an annual subscription for $50 (about £35).

Rob Warthen has provided instructions on how to use the DNAGedcom Client in his blog post Welcome to the DNAGedcom Client. The instructions are very easy to follow so I won't repeat them here.

One of the nice features about the program is that it operates from your own computer. It is a three-stage process. First of all you need to download your list of AncestryDNA matches. I found that it took about half an hour for my matches to be downloaded with a fast fibre-optic internet connection. The download time will vary depending on the number of matches you have and the speed of your internet connection.

The next stage is to download the family trees of your matches. This was a much longer process and took about one hour and fifteen minutes.

The final stage is to download your "in common with" matches. This download was much quicker and was completed in under fifteen minutes.

Having completed the process I was then able to access the downloaded files on my computer. The most interesting one is the spreadsheet with my list of matches. Here is a screenshot showing the columns and the information provided (click on the image to enlarge it). As you can see, the information also includes the number of shared centiMorgans and the number of shared segments.

I have a total of 3414 matches, but 1737 of these matches (51%) share a single segment under 6 cM in size. 704 of my matches share segments of 6.00 to 6.99 cMs in size. 719 matches share segments of 7.0 to 9.9 cMs in size. That leaves me with 254 matches with segments that are over 10 cMs. Of these, only 40 have segments that are 15 cMs in size or bigger.

There is also a spreadsheet showing the names of the ancestors of my matches. There are 152,963 names in this spreadsheet. This list is potentially very useful to help me to identify the people in my match list who actually have ancestry in the UK where I might have a realistic chance of working out the genealogical connection. However, scrolling through the list it seems that the vast majority of the ancestors of my matches are in the US (Virginia, Maryland, Tennessee, Massachusetts, etc) and there are also quite a few in Canada (primarily Nova Scotia, New Brunswick and Quebec). I do have known relatives in Prince Edward Island in Canada but these are much more recent and should show up as high confidence matches sharing large segments of DNA.

Whether or not I can actually do anything with the information in these spreadsheets is a different matter altogether. Even with the phased data from Ancestry there is still likely to be a high false positive rate with all these very small segments. If the small segments are IBD (identical by descent), they are likely to be a reflection of very distant shared ancestry going back ten or twenty generations or more. Even if the connections are more recent, with the best will in the world it's impossible to find links with distant cousins whose ancestry is all in Colonial America.

In addition to my list of matches I now also have a spreadsheet with a list of my "in common with" matches. There are 96 names on this list which fall into sixteen groups.

I also have a folder labelled Tree Cache which I've not yet had time to investigate. The trees are designed to be used with the GWorks utility on DNAGedcom. However, as I have so few matches in my match list where I stand a reasonable chance of finding a genealogical connection it's probably not worth my time and effort to use this feature at present.

We don't yet know when AncestryDNA are going to roll out their new matching algorithms. It could happen tomorrow or it could happen some time in the next couple of weeks. If you want to experiment by downloading your matches with the DNAGedcom Client then I would recommend doing so sooner rather than later. I've certainly found it an interesting exercise to analyse my match list in this way and I shall be interested to see how my matches compare once the new and improved algorithms are rolled out.

We should start to see many more UK matches appearing in our match list over the course of the next year. AncestryDNA were doing a roaring trade at Who Do You Think You Are? Live, and the database is growing very rapidly. I'm also looking forward to seeing how the DNA Circles and New Ancestor Discoveries work.

Update 20th April
See also this blog post from AncestryDNA New advances in DNA science coming your way

© 2016 Debbie Kennett


Dan Edwards said...

"...but 1737 of these matches (51%) share a single segment under 6 cM in size."

Even those of us in the US will have a great number of "Moderate" (AncestryDNA label) matches.

However, your low number of estimated "4th" (and remember, at AcnestryDNA that category is really "4th to 6th" but they just shorten that to "4th" in most instances) cousins is no doubt a reflection of your location in the UK and non-US ancestors.

For the changes coming at AncestryDNA I am glad, but I was also glad for Autosomalgeddon, which puts me in a minority it seems.

Debbie Kennett said...

We're about three years behind you in terms of the usefulness of the AncestryDNA database but I have high hopes for the future. I do think that we will get proportionately fewer "fourth cousin" matches because in general the family sizes here were smaller and there was less endogamy (in England at least - it's a different matter in Wales, Ireland and Scotland).

I was also glad about Autosomalgeddon and I'm very pleased that AncestryDNA are further refining their matches. I shall look forward to finding out what they have in store for us.

diananel said...

I've got 396 4th or closer. I started the match gathering around 4 PM today and it's about 60% complete at 7:10. I might have to let my husband's run over night. (1440 4th or closer) And then there's my brother and my husband's 4 cousins. Yikes.

Debbie Kennett said...

Good luck Diana! With that many matches you probably don't need to worry about saving them all anyway. I'm only interested in saving mine as an academic exercise to see the effects of the new matching algorithms.