Sunday 24 March 2019

Advanced Genetic Genealogy: Techniques and Case Studies

I've been sworn to secrecy for the last two years but I am now pleased to announce the publication of a new book Advanced Genetic Genealogy: Techniques and Case Studies, edited by Debbie Parker Wayne. I contributed a chapter on "The promise and limitations of genetic genealogy" where I had great fun speculating about what the future holds for genetic genealogy. There are another thirteen chapters contributed by many well known names in the genealogy world. I've not yet seen the book or had the chance to review any of the chapters so I'm very much looking forward to reading it when my copy arrives.

Debbie Parker Wayne has worked really hard behind the scenes to bring this much-needed book to fruition and I am very grateful for her patience and encouragement.

The book is currently on sale on Amazon. On the US site the book is showing as being available for shipping within the next one to two days. On the UK site delivery is expected within the next one to two months. A Kindle version will be available in May 2019. For US readers who are going to the National Genealogical Society conference in May you will be able to buy a copy from Books and Things who will be exhibiting at the conference. For details see here.

Here is the description of the book from Amazon:
Advanced Genetic Genealogy: Techniques and Case Studies helps intermediate researchers move up to the next level and advanced researchers apply the new DNA standards and write about DNA. This new book offers an in-home course in advanced genetic genealogy. Case studies demonstrate analyzing the DNA test results, correlating with documentary evidence, and writing about the findings, all incorporating the updated standards for using DNA. Full-color illustrations help the genealogist incorporate these techniques into personal or client research projects. Each of the fourteen chapters was written by a professional genealogist with DNA experience. 
Eight chapters study real families (some using anonymized identities), including methods, tools, and techniques. Jim Bartlett covers how to triangulate a genome (mapping DNA segments to ancestors). Blaine T. Bettinger demonstrates the methodology for visual phasing (mapping DNA segments to the grandparents who passed down the segment to descendants, even when the grandparents cannot be tested). Kathryn J. Johnston shows how to use X-DNA to identify and confirm ancestral lines. James M. Owston describes findings of the Owston Y-DNA project. Melissa A. Johnson covers adoption and misattributed parentage research. Kimberly T. Powell provides guidance when researching families with endogamy and pedigree collapse. Debbie Parker Wayne combines atDNA and Y-DNA in a Parker family study. Ann Turner describes the raw DNA data and lab processes. 
Three middle chapters cover genealogy standards as they relate to DNA and documentary evidence. Karen Stanbary applies the Genealogical Proof Standard to genetic genealogy in a hypothetical unknown parentage case illustrating start-to-finish analysis. Patricia Lee Hobbs uses atDNA to identify an unknown ancestor and that ancestor's maiden name, moving back and forth between documentary and DNA evidence. Thomas W. Jones describes best practices for genealogical writing and publishing when incorporating DNA evidence. 
Three concluding chapters deal with ethics, emotions, and the future. Judy G. Russell covers ethical considerations. Michael D. Lacopo describes the effect on relationships when family secrets are uncovered, surfacing issues for all concerned. Debbie Kennett covers the current limitations and future promise of using DNA for genealogy. An extensive glossary, list of recommended resources, and index are included.
If you click on the Amazon UK links in this blog it is vaguely possible that at some point in the distant future I might receive a microscopic payment from Amazon as part of their affiliate scheme to help support my writing. Using the affiliate links makes no difference to the prices you pay. 

Tuesday 5 March 2019

Ancestry updates at Rootstech – ThruLines, Tree Tags and Improved DNA Matches

At Rootstech last weekend Ancestry announced the launch of three new features: Tree Tags, ThruLines, and New and Improved Matches. The announcement was made in the keynote speech by Margo Georgiadis, Ancestry's new CEO, which you can now watch on YouTube.

Margo stated that Ancestry host 100 million family trees, and that they "will soon have more than 15 million people" in their DNA network, making it the largest consumer DNA database in the world.

Crista Cowan provided a live demonstration of the new tools in her presentation "What you don't know about Ancestry". The recording of her talk is now available on the Rootstech website and is well worth watching if you want to get a good overview of the new features. I also recommend watching Diahan Southard's talk on Connecting your DNA matches, which shows how to use the tools to form genetic networks, but also highlights some of the limitations.

The Tree Tags and New and Improved Matches are currently in beta testing. You can opt in to the beta by accessing the new AncestryLab menu on your Ancestry account. This can be found under the Extras tab. ThruLines has been rolled out to the entire AncestryDNA database and you will see the feature when you log into your DNA account. The screenshot below shows what my home page now looks like.

The new matches experience replicates the functionality of the third-party Chrome extensions MedBetter DNA and DNA Match Labelling, which many of us found very helpful for managing our matches. These extensions are now redundant. If you have previously used the extensions you need to be aware that they won't work with the new system. Before participating in the beta you might want to make a note of all your groupings and tags so that you can transfer them to the new interface.

Because these new tools are all in beta testing it's important to remember that the features may change and new functionality might be added. If you spot any bugs or have suggestions for improving the tools make sure you submit feedback to Ancestry.

So now let's have a look at the Improved Matches and ThruLines features and see how they work in practice.

New and improved DNA matches
I currently have 24,900 matches at AncestryDNA, which is far more than I could ever possibly hope to investigate. Any tools that will help me to sort and filter these matches to find the most useful ones are always going to be welcome. Below is a screenshot of my new matches page. I've tested both my parents and matches are now allocated to the father's side and the mother's side. However, the parental sides are only shown for my 162 fourth cousin and closer matches. It would be really helpful if Ancestry could extend this tool to show the sides for the more distant matches, or at least for the first 1000 or 2000 matches on your match list. You will also notice that the amount of shared cM has been round up to the nearest whole number.

I always write fastidious notes about my matches and the notes can now be viewed directly from the matches page without having to click through to view the match. This functionality was previously only available when using the MedBetter DNA extension. You won't be able to see the full note on the matches page, but you can click on the note symbol, as in the screenshot above, to read the full text.

Previously we could view 50 matches on a page. Now there is an infinite scroll system which means that you can keep scrolling down the page and see more and more matches. This is very handy if you're trying to search by keyword for particular matches. I can now, for example, see all my fourth cousin and closer matches on a single page. Previously you could use the page numbers to work out how many pages of matches you had, which then allowed you to calculate the total number of matches. I can't currently see any way of replicating this function with the new system.(*See the update at the end of this article.) If you want to know how many matches you have you can use the DNAGedcom Client to download your entire match list. The Client is available for a modest subscription from DNAGedcom.

The updated match list has a number of new filters for sorting your matches. You can sort by close matches, distant matches, matches you haven't viewed, matches with notes, matches you've messaged and tree status (private linked trees, public linked trees and unlinked trees). Unfortunately the facility to filter matches by sub-region has temporarily been lost but I understand that it will eventually be restored.

The good news is that it's now possible to create custom groups which can be labelled and assigned a colour. This new feature is modelled very closely on the DNA Match Labelling extension, but provides additional functionality such as the ability to filter matches by custom group. You can also have more than one coloured dot for each match. There are 24 different colours available. I'm still experimenting with the coloured dots but an obvious use of the custom groups is to assign different colours to specific surnames or ancestral couples.

I've added a coloured dot for matches which don't have any shared matches. I can use this filter to go back and check these matches from time to time to see if they do now have any shared matches. Currently 23 of my 162 fourth cousins or closer (14%) don't have any shared matches. It will be interesting to see if the percentage drops over time as more matches start to come in.

I also have a coloured dot for what I call "dodgy matches". These are matches which share substantially less DNA with my parents than they do with me and are therefore not likely to be worth pursuing. Currently 29 of my 162 fourth cousin or closer matches (18%) fall into this dodgy category. In some cases there is a discrepancy of 10 cM or more. One match appears to share 22.3 cM with me but shares just 6.5 cM with my dad. Two of my dodgy matches actually share small amounts of DNA with both of my parents. One appears to share a single 21.2 cM segment with me but this is actually two segments: a 7.8 cM segment shared with my mum and a 6.5 cM segment shared with my dad. My second double-sided match shares 20.1 cM across 2 segments with me but this translates into a 6.1 cM segment shared with my mum and an 11.9 cM segment shared with my dad. I've not found any genealogical connection between my mum and dad within the last 400 years and these double matches are likely to be signals of very distant sharing dating back hundreds of years.

The screenshot below shows some of the custom groups I've started to use. No doubt my system will evolve over time. The coloured dots currently sort alphabetically so I might at some point decide to introduce a numbering system to get them to sort in a particular order.

Common ancestors
AncestryDNA's shared ancestor hints, otherwise known as shaky leaf hints, have now been renamed as common ancestors. I previously only ever had six shared ancestor hints and two of those were with my parents. I now have 20 common ancestors, though again two of those are with my parents. The shaky leaf hints only used our family tree and the trees of our matches to identify a shared ancestor. The common ancestors feature compares family trees and then deploys the power of other people's family trees to try and identify the common ancestor. The common ancestor feature is a useful tool to guide your research but every link will need to be carefully checked, and extreme caution needs to be exercised when matches are identified with more distant cousins.

Below is an example of a predicted common ancestor with a half sixth cousin. Ancestry have used my tree and my cousin's tree and then used a third-party tree to link the two lines together. Tidbury is indeed a name which appears in my family tree. The surname is concentrated in Berkshire and Hampshire and it's quite likely that I share a genealogical relationship with this match. However, the Tidburys are on my mother's side and this cousin matches me through my father so there is clearly no genetic connection. Even if the genetic connection was real, the total amount of DNA shared is very small and there is currently no way of assigning a single small segment like this to a sixth cousin with confidence.

ThruLines is a replacement for DNA Circles. The DNA Circles feature will continue in parallel to ThruLines for now but is no longer going to be updated. While DNA Circles was an interesting concept, in practice it was only of use to a subset of the AncestryDNA database because of the strict requirements needed to create a circle. I only ever had one DNA Circle, but AncestryDNA have identified 57 potential ancestors for me with the ThruLines feature.

ThruLines is now in open beta testing and any Ancestry member who meets the following criteria will receive the feature free of charge for a limited time:
  • Your AncestryDNA results must be linked to a public or private searchable family tree.
  • You must have DNA matches who have also linked their results to a public or private indexed family tree.
  • Your linked family tree needs to be well built out. It should be 3-4 generations deep to have the best chance of ThruLines finding new discoveries for you to explore
AncestryDNA have said that the feature will come out of beta testing when they have had enough feedback to validate the value of the tool to their customers, including whether or not the feature will require a subscription.

With ThruLines you get a DNA record card for each ancestor for whom you have a DNA match. If you click on the card you will get a report showing the possible pathways through which you and your matches are connected to the ancestor. The pathways are filled in not just from your own family tree or the family trees of your matches but from multiple Ancestry family trees. This is essentially a machine-learning algorithm deploying the power of big data to make connections. I would guess that, like the We're Related App, the feature is generated by the AncestryDNA Big Tree. For information about the Big Tree I recommend reading the very informative blog post by Randy Seaver "Is there really an Big Tree?"

You can filter the ThruLines in three ways.
  1. If you filter by ancestors from your linked tree you will be given descendant reports on your matches who meet the required criteria and  who share the ancestors featured in your tree. This is a very useful way of locating matches who descend from a specific ancestor.
  2. If you filter by potential ancestors you will be able to view links that Ancestry have made between you and your matches by stitching together multiple family trees to make connections.
  3. The third option is all ancestors which will show you all your potential ancestors and all ancestors from your linked tree
ThruLines goes back seven generations to fifth great grandparents which means that you will not get potential ancestor hints with anyone who is not a sixth cousin or closer.

The example belows show a genetic descendancy report for my great-great-great-grandfather Samuel Trask. It shows my line of descent on the left and then the lines of three of my matches and their lines of descent down to the present day with the speculative links highlighted with dotted boxes. In this particular case the connections were hard to make because these matches were related to me through a female line. I hadn't noted these changes of surname upon marriage in my family tree so I might not have spotted these connections easily without this feature. However, caution will need to be exercised and the trees of these matches will need to be evaluated independently. I will also have to check the amount of shared DNA to see if it is consistent with the hypothesised relationships.

The example below shows a potential ancestor that Ancestry have identified for my great-great-great-grandfather Thomas Thorn. Two cousins who are potentially related to me through Thomas Thorn are shown, and Ancestry have also identified William Thorn as Thomas's possible father because of a DNA connection through Robert Browning Thorne. I have insufficient evidence at present to evaluate this possible link but it may be that I will get more matches on these lines in future which will provide further clues.

One disadvantage of the ThruLines is that the potential ancestor feature is triggered even if your only DNA match is with a parent. As I've tested both of my parents I'm finding that a lot of the potential ancestors are only appearing because of a single parental match. Ideally parents should be excluded from the feature or there should be the ability to filter out matches with parents.

The feature seems to be most useful for identifying how you are connected to your DNA matches. It's particularly helpful for the fifth to eighth cousin matches which are otherwise very difficult to search. In many cases the matches only share very small amounts of DNA under 10 cM and I'm not convinced that the DNA match is a result of the documented genealogical connection. Nevertheless, it's still useful to identify more genealogical cousins and to have the opportunity to communicate with them about our shared surnames and family trees.

A nifty bonus feature of the Improved DNA Matches and the ThruLines is that you can now generate a table showing the possible relationships and the probabilities of those different relationships. This is very similar to the Shared cM Tool on the DNA Painter website. The DNA Painter tool uses probabilities based on simulated data from the AncestryDNA Matching White Paper combined with ranges derived from the Shared cM Project. These new tables from Ancestry are based on simulations. They've not released any details but I'm hoping that they will publish a white paper or blog explaining the methodology behind their calculations. You can access the probabilities table from your new matches page by clicking on the question mark next to the amount of shared DNA. You can access the tool from the ThruLines feature by clicking on the amount of DNA you share with your matches in the descendancy chart.

Here is the probability table from Ancestry for one of my second cousins with whom I share 168 cM.

Interestingly, Ancestry have predicted that this person is my third to fourth cousin whereas the probabilities show that he is more likely to be a second cousin.

If I plug the amount of shared DNA (168 cM) into the Shared cM Tool at DNA Painter I get the following probability table which suggests that this cousin is slightly more likely to be a half second cousin than a full second cousin but still has a reasonable probability of being a second cousin.
I've found that Ancestry's probabilities are not so realistic for the more distant relationships in the fifth to eight cousin category where people share much smaller amounts of DNA. Most of the matches identified with the ThruLines feature appear to fall into this category. The Ancestry simulations don't seem to have gone beyond the fifth cousin level. As a result, the probabilities suggest that almost all of your DNA matches, however little DNA they share, will be fifth cousins or closer. Here's the probability table for the predicted half sixth cousin I mentioned above who shares just 10 cM with me. The table suggests that 98% of cousins sharing 10 cM will be fifth cousins or closer. A half sixth cousin doesn't even appear as a possibility.

This prediction is actually in conflict with Ancestry's confidence scores. Matches sharing 6-16 cM are classified as moderate confidence matches with only a 15-50% chance of sharing a single recent common ancestor. Ancestry say that for moderate confidence matches "You and your match might share DNA because of a recent common ancestor or couple, share DNA from very distant ancestors, or you might not be related."

Two separate studies have also shown that 10 cM segments can potentially go back a long way and can perhaps date back twenty or thirty generations. A chart from a 2012 study by Speed and Balding reproduced in the ISOGG Wiki shows that less than 40% of 10 cM segments are likely to fall within the last 10 generations. The Speed and Balding paper used computer simulations. A 2013 study from Ralph and Coop, using real life data from the European POPRES dataset, showed that while most 10 cM segments are likely to come from the last 500 years, we have many more very distant cousins so the vast majority of 10 cM segments will be shared through very distant ancestors. The authors go on to say that "the typical age of a 10 cM block shared by two individuals from the United Kingdom is between 32 and 52 generations". The distribution will vary depending on a given population's shared history.

I would suggest that Ancestry need to refine the parameters of their simulations to allow for more distant sharing and to produce more realistic probability tables for matches sharing low amounts of DNA. In the meantime, when weighing up the strength of DNA evidence, you need to bear in mind the uncertainties in the predictions when sharing small amounts of DNA. If you share a single 10 cM segment with a cousin there is currently no way of determining whether you are matching as sixth cousins or sixteenth cousins.

These new features from Ancestry are very welcome. The new matching interface is already making it much easier to navigate and sort our matches. ThruLines is likely to be a valuable tool for identifying potentially interesting matches to follow up. The predictions are likely to get better over time as more data becomes available and as the machine learning improves. Remember that any tool has to be used carefully. It provides clues but does not give you all the answers. You will still need to evaluate the DNA evidence in combination with the genealogical evidence.

Update 7th March 2019
Check out the All Matches filter in your new match list. AncestryDNA have now included stats on the total number of matches, the number of close and distant matches, the number of new matches and matches shared with your mother or father. I now have 25,089 matches, 166 of which are predicted to be fourth cousins or closer.

Further reading and resources