Thursday, 28 January 2016

Autosomal DNA triangulation. Part 1: the basics

There have been a lot of discussions in the genetic genealogy community in the last few months in the ISOGG Facebook group and on the ISOGG DNA Newbie list on the subject of triangulation for autosomal DNA. As a contribution to the debate I thought I would take the opportunity to share my own understanding of all the issues involved. This is the first of two articles on the subject. I will start by covering some of the basic principles of autosomal DNA inheritance and triangulation. In the second part I will look at the phenomenon of triangulated segments.

One of the difficulties I've found is that people are using the term triangulation in different ways to mean different things. Triangulation is a term that has been adapted from surveying. It was first used in genetic genealogy in the context of Y-chromosome DNA and mitochondrial testing by Bill Hurst who proposed the following definition on the Rootsweb Genealogy DNA list in 2004:
Triangulation: In genetic genealogy, the determination of the Y-chromosome DNA of a male ancestor by finding an exact match between direct paternal descendants of two sons of the ancestor. Similarly, the determination of the mitochondrial DNA of a female ancestor by finding an exact match between direct maternal descendants of two daughters of the ancestor. 
Autosomal DNA is of course a lot more complicated than Y-DNA and mtDNA. We receive one set of 22 autosomes from our father and one set of autosomes from our mother. However, before the DNA is passed on from the parents to the child it undergoes a process known as recombination, which means that it gets shuffled up before it’s passed on. We receive 50% of our DNA from our mother and 50% from our father, but the DNA that we receive from our parents is a patchwork of the DNA from all four of our grandparents. Sometimes we will inherit an entire chromosome from one of our grandparents but more often than not our 22 autosomes will get split up into one or two large segments on each chromosome. You can see this process in action in the screenshot below. The comparison has been done using the Family Tree DNA chromosome browser and is from the point of view of my son. His DNA is compared with me (pink), his father (green), his maternal grandfather (orange) and his maternal grandmother (blue).  You can see that the segments of DNA he has inherited from his maternal grandparents have been broken up into large chunks, though on chromosome 18 he has inherited the entire chromosome from his maternal grandmother, and he has received his entire chromosome 22 from his maternal grandfather.


While we share large chunks of DNA in common with our grandparents the number of shared segments and the size of those segments gets smaller with each passing generation, and we eventually reach a point where we have genealogical ancestors from whom we have inherited no DNA at all.

Autosomal DNA triangulation works on the same principles as triangulation for Y-DNA and mtDNA. We start with the known and work back to the unknown, and we combine DNA evidence with sound genealogical evidence to draw a conclusion. For autosomal DNA we are looking at specific segments of DNA and trying to determine the ancestor or ancestral couple from whom we inherited that DNA. For this process to work we need relatives who are closely related to us with known genealogies. If you test two known first cousins and they have the expected amount of DNA in common for a first cousin relationship you can assign the shared segments to their mutual grandparents. Similarly if you test two second cousins and they share the appropriate percentage of DNA in common you can infer that the shared segments have been inherited from their mutual great-grandparents.

The technique can also be used with third, fourth and fifth cousins but it is important that both parties have sound genealogies and are able to trace back their ancestors on all their family lines for the appropriate number of generations in order to rule out the possibility of a relationship on a different pathway. The assignment of segments to fourth and fifth cousins is more secure if the match can also be triangulated with other close family members (eg, a parent, an aunt or uncle, a first or a second cousin).

Triangulation can be used in combination with chromosome mapping, a technique which is deployed by some of the more advanced genetic genealogists in our community. Chromosome mapping opens up exciting possibilities, and has the potential to enable us to make a partial reconstruction of the genome of our ancestors. This will eventually allow us in some cases to determine which traits, such as hair colour and eye colour, we can attribute to specific ancestors. Such an exercise has already done by AncestryDNA who were able to reconstruct about 50% of the genome of David Speegle, a man who lived in Alabama in the early 1800s. David Speegle was chosen for the exercise because he had two wives, Winifred Crawford and Nancy Garren, and also an exceptionally large number of children who in turn went on to have lots of children. This meant that Speegle had many surviving descendants in the AncestryDNA database which made the task of reconstruction a lot easier. I hope that AncestryDNA will eventually publish this research in a scientific journal. In the meantime it's instructive to look at this video which explains the methodology and to study David Speegle's chromosome map (starting at around 1 minute 58 seconds) showing all the reconstructed segments scattered across his 22 autosomes. It is ironic that AncestryDNA uses a chromosome map to demonstrate this concept but that they do not currently provide their customers with the matching segment data or a chromosome browser so that we can replicate the methodology ourselves.

Another interesting study has been done by genetic genealogist Kitty Cooper. She has created a chromosome map showing the segments she has been able to attribute to her great-great-grandparents Jørgen and Anna Wold of Drammen, Norway. Reconstructing the genomes of our ancestors is rather like trying to do a giant genetic jigsaw puzzle. One of your cousins might have the segment containing the alleles for your ancestor's brown eyes, and another cousin might have the segment with the alleles for his brown hair. We don't all inherit the same piece of the jigsaw puzzle but instead we all inherit different pieces which can be joined together to reconstruct the bigger picture.

While it is possible to use known, close autosomal DNA matches for chromosome mapping and assign segments of DNA to our ancestors to about the fifth or sixth generation, it is much more difficult to map segments to more distant ancestors. The first problem is that our family trees become more difficult to research as we go further back in time. Two fifth cousins will share their great-great-great-great-grandparents in common. However, we have 64 great-great-great-great-grandparents. Very few people are able to identify all 64 of them, and only a minority of family historians are able to identify all of their 32 great-great-great-grandparents. It therefore becomes very difficult to conclude that the match is on the specific line of interest and that we are not matching because of shared descent on a different line which we haven't yet researched. In addition, because of the random way in which autosomal DNA is inherited, the relationship predictions become less reliable for the more distant relationships. The companies will therefore give you a range of relationships within which the match is likely to fall rather than a precise relationship. For example 23andMe assigns the more distant relationships as third to distant cousin or fourth to distant cousin. At FTDNA the more distant relationships are split into fourth cousins to remote cousins and fifth cousins to remote cousins. AncestryDNA gives predictions for fourth to sixth cousins or fifth to eighth cousins.

The second problem is that as we go further back in time we start to find some ancestors from whom we have inherited no DNA at all. While we will probably have inherited DNA from all of our 32 great-great-great-grandparents there might be one or two of our 64 great-great-great-great-grandparents from whom we have not received any DNA at all. For a good explanation of this process see the blog post by Graham Coop on How many genetic ancestors do we have?  See also the useful table by Bob Jenkins in his article How many genetic ancestors do you have? What this means is that once you go back beyond about 10 generations (roughly 300 years) only a small fraction of your ancestors have contributed directly to your DNA.1 If you wish to triangulate a match with a fourth or more distant cousin you must first of all hope that both of you have inherited some DNA from the ancestor of interest. You must also hope that both of you have inherited the same segment of DNA on the same chromosome. As we are only likely to share one segment with a fifth cousin, if we share any DNA at all, you can see that when there are 22 autosomes to choose from the chances that you will both share a segment on the same chromosome are likely to be very slim indeed. If fifth cousins do have any detectable IBD sharing it is has been estimated that it will usually be composed of a single segment with a mean length of 8.3 cM (∼8Mb).2

All three testing companies have provided percentages showing the chances of matching a known cousin at the differing degrees of relationship. I've compiled the statistics into the table below.3

Relationship23andMe
(unphased)
Family Finder
(unphased)
AncestryDNA
(phased)
2nd cousin> 99% > 99% 100%
3rd cousin~ 90%> 90% 98%
4th cousin~ 45%> 50% 71%
5th cousin~ 15%> 10% 32%
6th cousin or more distant< 5%Remote
(typically less than 2%)
11%

AncestryDNA phases the genotypes before doing the matching process. (Phasing is the process of assigning alleles to the maternal and paternal chromosomes and will be discussed in more detail in the second article in this series.)  As can be seen from the table, phasing provides a better chance of matching at the fourth and fifth cousin level, but even with phased data it is clear that the odds of two fifth or sixth cousins sharing enough DNA on a specific line to show up as a match are still very slim.

Although the odds of matching a specific fifth or sixth cousin are actually very low, because we have so many fifth, sixth and more distant cousins these more distant relationships will dominate our match lists. Henn et al (2012) produced a model to estimate the expected number of cousins at different degrees of relationship and the figure is reproduced below courtesy of a Creative Commons Licence.4


A model produced by researchers at AncestryDNA, based on birth and census data from the last 200 years, produced some rather different statistics. They found that a typical British person had "five first cousins, as well as 28 second, 175 third, 1,570 fourth, 17,300 fifth, and 174,000 sixth cousins" making a grand total of 193,000 living cousins. Whatever the numbers might be, it is clear from the maths alone that because we all have such huge numbers of seventh, eighth and more distant cousins, the vast majority of our more distant matches are much more likely to fall in this range than to be fifth or sixth cousins.

Pedigree collapse and endogamy
Relationship predictions can be confounded by recent pedigree collapse. This is the phenomenon whereby the same ancestral couple appears twice or more in your family tree. For example, if your parents were first cousins you would have six great-grandparents rather than eight and 48 great-great-great-great-grandparents instead of 64. If your parents were second cousins you would have 14 great-great-grandparents instead of 16, and 56 great-great-great-great-grandparents instead of 64. This means that you will inherit more DNA from the ancestors who appear twice on your family tree, and there is a greater chance that their DNA will be preserved.

Endogamy is another confounding factor for relationship predictions. Endogamy is the practice of marrying within the same ethnic, cultural, social, religious or tribal group. Sometimes endogamy is enforced as a result of geographical isolation. Within an endogamous group there are multiple marriages between first, second and third cousins and everyone is effectively related to everyone else multiple times over within a very recent timeframe. Ashkenazi Jews are one example of an endogamous population. They can be traced back to a recent bottleneck with an effective population size of about 350 between 25 and 32 generations ago. The bottleneck was followed by rapid exponential expansion.5 With autosomal DNA tests people who are descended from an endogamous population will have significantly more matches than someone from a non-endogamous population. They will have a larger total cM count with their matches and will share more segments in common.6 As an example, I have no recent endogamy in my family tree, and I have 526 Family Finder matches at Family Tree DNA. In contrast, a British Jewish friend of mine now has 6150 matches.

Conclusion
We have seen how autosomal DNA triangulation can be a very useful tool when DNA evidence is combined with sound genealogical research to draw conclusions about close relationships up to about the fourth or fifth cousin level. In the second and final article of this series I will look at the interesting phenomenon of triangulated segments of DNA  segments of DNA which appear to be shared by multiple people descended from a single common ancestor. But do these segments have any genealogical relevance?

See also
Part 2: Autosomal DNA triangulation  – the phenomenon of triangulated segments

Useful resources

- The Autosomal DNA Portal in the ISOGG Wiki
- Genetic genealogy and the single segment A blog post from geneticist Steve Mount with some interesting insights into autosomal DNA matches
Expand and support your research with AncestryDNA Circles An excellent presentation from computational biologist Dr Ross E Curtis which explains the basics of autosomal DNA inheritance and the methodology behind AncestryDNA's Circles feature. If you have tested at Ancestry also check out the articles in the "Getting started with DNA Circles" menu. 
AncestryDNA's DNA Circles White Paper

Footnotes and references
1. See also Speed D and Balding DJ (2015). DNA and pedigree ancestors. Supplement S2 for the paper Relatedness in the post-genomic era. Nature Reviews Genetics 16: 33-44.
2. Browning SR, Browning BL (2012). Identity by descent between distant relatives: detection and applications. Annual Review of Genetics 46: 617-33.
3. The 23andMe data was extracted from the FAQ The probability of detecting different types of cousins. The FTDNA data was taken from the Learning Center article What is the probability that my relative and I share enough DNA to be detected by Family Finder? The AncestryDNA data was extracted from Table 1 in the help article "Should other family members get tested?" This is available to AncestryDNA customers only and can be accessed through the AncestryDNA "Matching Help and Tips" menu.
4. Henn BM, Hon L, Macpherson JM, Eriksson N, Saxonov S et al (2012). Cryptic distant relatives are common in both isolated and cosmopolitan genetic samplesPLoS ONE 7(4): e34267.
5. Carmi S, Hui KY, Kochav E et al (2014). Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European originsNature Communications 5: 4835.
6. Paull JM, Tannenbaum GS, Briskman J (2014). Why autosomal DNA test results are significantly different for Ashkenazi Jews. Avotaynu XXX (1): 12-18.

© 2016 Debbie Kennett

18 comments:

Building Magic said...

"I will look at the interesting phenomenon of triangulated segments of DNA – segments of DNA which appear to be shared by multiple people descended from a single common ancestor. But do these segments have any genealogical relevance?"

It depends on the strength of the evidence. E.g., triangulated segments of DNA 20-30 cM or larger will be much more helpful in determining shared ancestry than segments that are smaller than 10 cM.

IsraelP said...

I, for one, don't believe that "350 Jews in Europe" business, but I refer to it to show how estimations of such things differ so extremely from one another.

There was a significant expansion in the generations following the expulsions from Spain and Portugal in the 1490s and many of those expelees who were taken in by the Turks, the Italians and other Mediterranean populations, eventually moved north where they (WE!) think they were always and forever of Ashkenazic stock.

Unknown said...

Dear Debbie, Thanks for a really good and well detailed blog on such a complicated process. With your permission I would like to add this blog to the data I share with new members in my FTDNA projects expressing an interest in Family Finder testing. Many of the new matches to my seven autosomal kits have so many questions that you have covered so well here. Please also write on the X Chromosome and any latest info on this most difficult of autosomal research IMO. Thanks for all your volunteer work with the ISOGG Wiki. LASM

Debbie Kennett said...

Jason, I will be publishing my second blog post soon and will cover the issue of triangulated segments. It's not the size of the segments that's important but the number of people who share the segments. The more people who match on the same segment the more distant the common ancestor is likely to be.

Israel, The figure of 350 Jews in Europe is what is known as the effective population size. Click on the link in my blog post to understand what this means. It's an idealised theoretical figure used by population geneticists and is not a reflection of the actual census population at that time. The extrapolation is made purely from the dataset used in the article so if a bigger dataset were used they would come up with a different answer.

Linda, Thank you for your kind words. Do feel free to share this blog post or any other blog post that I've written, with your matches or your project members. I haven't yet done much work on the X-chromosome. Kathy Johnston is the expert on the X-chromosome. I think it's difficult to use the X-chromosome at present for anything other than close family matches. AncestryDNA don't use the X-chromosome for their matches and FTDNA only report X-chromosome matches if the two people already match the autosomal DNA threshold. Consequently many true matches are missed, and females get lots of irrelevant false matches on small segments. You have to get people to upload GedMatch which they often don't want to do.

Unknown said...

Dear Debbie,

I really enjoyed your article. Just looking for clarification on your response to Jason's comment - "It's not the size of the segments that's important but the number of people who share the segments. The more people who match on the same segment the more distant the common ancestor is likely to be."

If I find 5 people that I match with on the same chromosome, within the same segment and they all match each other - does that mean the match is more distant? For some reason I thought it might mean closer... Does the size of those segments effect the distance of the relationships? Thanks!

Janet

Debbie Kennett said...

Janet

I've just published the second article in my two-part series which I hope might answer your question. I think it's the other way round. The more people who match on the same segment the more distant the match is likely to be. According to the scientific literature haplotype frequency is the main determinant of the age of the segment. In general, longer segments are usually more recent but it doesn't always follow. If it's a long segment and matches lots of people then it's more likely to be distant than recent.

Best wishes

Debbie

Unknown said...

Debbie, very well written blog article, glad I heard about it through the DNA Newbie Yahoo group. I appreciate your thoughtful and comprehendable posts there, and I managed to find a copy of your book, as well!

Debbie Kennett said...

Thank you D R Hunter for your kind words. I hope you like my book. There have been a lot of changes since that book was written so some of it will not apply though the basic principles are still sound.

Jo Henn said...

This is quite a interesting and helpful post. Thank you for sharing it. I wanted to let you know that I have included it and your Part 2 post in my NoteWorthy Reads post: http://jahcmft.blogspot.com/2016/03/noteworthy-reads-26.html

Debbie Kennett said...

Thank you Jo. That's very kind of you.

Unknown said...

Mabuhay from the Philippines. I am a genealogist with a dilemma that I hope Debbie and others may be able to give advice on.

I did some research on my friend and neighbor's family and created a family pedigree for her. She is a Filipino whose mother was from the Philippines and father was an American gold miner here in the Philippines. I was quite successful in my research and was able to determine how my friend's father died at the hands of the Japanese Military Police during WWII (a very interesting story indeed). I also located all of her father's living relatives in the USwho had never before now learned the exact fate of their ancestor.

To make a long story short, the family in the US wants proof through DNA testing that my friend is a relative before they will make contact with her. So her father's grandson and my friend had autosomal DNA testing. We are awaiting the results. My lingering question is whether this test will prove my friend is a relative. The grandson is the son of my friend's father and his first wife. So my friend is the grandson's half-aunt.

I read about the avuncular DNA test which is usually used for paternity testing when the father is unavailable. However, I do not think this test is suitable in this case due to the half-sibling-ship of my friend and the other DNA test taker's father.

What is your take n this? I will appreciate your input greatly.

Debbie Kennett said...

Hello Mabuhay

Autosomal DNA testing could indeed help in this particular case so long as the right test has been taken. The tests offered by paternity testing companies would not be very helpful and would not necessarily give a definitive answer. These tests look at autosomal markers known as STRs (short tandem repeats), and only a handful of such markers are tested (about 15, 20 or so depending on which company is used). However, if the testing has been done with one of the genetic genealogy companies (Family Tree DNA, AncestryDNA or 23andMe) then there would be a definitive answer. These tests do comparisons using around 700,000 markers. They test a different type of marker known as a SNP (single nucleotide polymorphism). A half-aunt and a half-nephew would share around 12.5% of their DNA in common and one of these tests would confirm this relationship. Family Tree DNA sell their Family Finder test in the Philippines but, as far as I'm aware, neither 23andMe nor Ancestry sell their tests in the Philippines at present.

Unknown said...

Thanks a lot, Debbie. My frind and her half nephew had the ancestry.com autosomal test done. So I am reassured and pleased that we took the right one!

BTW, your two articles were quite helpful in understanding presently available genealogical DNA testing, I had resisted really getting into the nitty gritty of DNA up until now. However, your articles made it quite understandable. I have sent them to my friends and family who are into genealogy and they are also elated with how easily you have explained such a complicated issue. You have done a great service here!

Thanks again.

Saro Genova/Queenie Makulit

Debbie Kennett said...

Hi Queenie

I'm glad to be of help. Thank you for your kind words.

When you get your AncestryDNA results you might find it helpful to upload your results to GedMatch. This will allow you to look at the matching segment data and see which chromosomes you match on (assuming of course that you do actually have a match). You can find information about GedMatch in this ISOGG Wiki article:

http://www.isogg.org/wiki/Autosomal_DNA_tools

It used to be possible to transfer AncestryDNA results to Family Tree DNA. Hopefully this facility will be restored in the near future. You can find further information here:

https://www.familytreedna.com/AutosomalTransfer

Michelle said...

Any idea why Ancestry claims better chances of finding matches in say 3rd cousins? If unphased analysis finds 90% of genealogical 3rd cousins to be matches, I would expect phasing to only be able to reduce that number further, because many of the unphased matching segments will drop out as false positives when phased.

Or is Ancestry basing their findings on much shorter segments than the others, because they can more readily distinguish false and authentic short segments?

Debbie Kennett said...

Hi Michelle

I suspect the reason why Ancestry claim a higher success rate is because they have lower match thresholds. Because they are using phasing they are better able to detect the smaller segments under 10 cMs. This ISOGG Wiki page shows the thresholds that the three companies use:

http://isogg.org/wiki/Autosomal_DNA_match_thresholds


However, no one has ever put these claims to the test.

Family Sleuther said...

Hi Debbie,

Thank you for directing me to your primer on autosomal triangulation. While I understood the odds were not in my favor for triangulating 5th cousin and 5C1R matches, this overview (and the linked external resources) are helpful in better grasping the varying factors that contribute to that challenge.

On to Part II!

Mike

Debbie Kennett said...

Hi Family Sleuther

It's precisely because the odds are not in favour of triangulation that alternative explanations should be considered first. One difficulty is that we can't easily check for segment frequency to see if lots of people are piling up on this segment. There is a triangulated segment tool at GEDmatch but you need to get everyone to upload their results first.

The amount of shared DNA is still very small and it's still far more likely that this DNA is being shared because of a very distant relationship rather than a recent genealogical relationship.

This blog post from Graham Coop and Peter Ralph is well worth reading:

https://gcbias.org/2013/05/10/identification-of-genomic-regions-shared-between-distant-relatives/

Also check out the ISOGG Wiki page on IBD and particularly the figure on that page reproduced from a paper by Speed and Balding:

https://isogg.org/wiki/Identical_by_descent