Thursday 28 January 2016

Autosomal DNA triangulation. Part 2: the phenomenon of triangulated segments

In the first article in this two-part series I covered the basics of autosomal DNA inheritance and how triangulation can be successfully used to assign segments to specific ancestors within the last five generations or so in combination with chromosome mapping. I'm now going to take a look at the interesting phenomenon of "triangulated segments"  segments of DNA that are shared in common with multiple people.

We would naturally expect several of our close relatives (grandparents, aunts and uncles, and first cousins) to share some segments in common but once you get out to the fifth or sixth cousin level the chances of sharing any DNA with a specific cousin are very low. If you do match a fifth cousin or a more distant cousin you would only expect to share a single IBD segment.  Despite the low odds, because we have so many distant cousins, we will inevitably find some rare examples of people who match three or more fifth cousins who are descended from the same ancestral couple. This is much more likely to happen if the couple had a large family and their descendants also went on to have lots of children. However, because we all inherit different combinations of segments of DNA from our ancestors you would expect to match other fifth and sixth cousins on different segments rather than all on the same segment. We all inherit different pieces of our ancestors' genetic jigsaw puzzle rather than all sharing the same piece.

While I'm not aware of any scientific papers that have studied the frequency of triangulated segments, AncestryDNA have done some interesting computer simulations which shed some light on the matter. They found that a group of three first cousins shared a matching segment of over 5 cMs in length over 80% of the time. Three second cousins shared the same segment around 60% of the time, but for for third cousins the rate was just 15%. The chances that three fourth cousins would all share the same matching segment were found to be around 1% (see figure below). From this we can infer that it would be similarly extremely unlikely for three or more fifth, sixth or more distant cousins all to match on the same segment through IBD descent from a specific ancestor. This of course assumes that they have inherited enough DNA from their mutual ancestor to show up as a match at all.
Figure reproduced by kind permission of Ancestry DNA from the customer help article
 
"Do all members of a DNA Circle share the same matching segment?
The results of the AncestryDNA simulations have been replicated in the real-life findings from their DNA Circles feature. They say:
Data we've gathered from our DNA Circles also shows that it is unlikely to find matching segments among three or more people. Only 4 percent of DNA Circles (that have between 3 and 30 members) have three or more people that share the same matching segment. In the remaining 96 percent of DNA Circles, no more than two people in the Circle share any one particular segment. In other words, even in DNA Circles with 30 descendants, usually only two or three descendants will all inherit the same segment.1
The results of the AncestryDNA research into triangulated segments has not been published in a peer-reviewed scientific journal. We don't know the assumptions that were made in the models and we have to accept the results in good faith. We also don't have access to the shared segment data for our matches at AncestryDNA so that the findings can be independently verified. However, AncestryDNA do have a good team of scientists working for them. I'm also not aware of any credible research that has disproved their findings.

In contradiction to what we might expect from the AncestryDNA findings, there is a lot of anecdotal evidence that people are seeing lots of matches which all fall on the same segment on the same chromosome. We don't have much data on the extent of the phenomenon in terms of the number of people in each "triangulated group" and the size of the segments that they share. The situation is also complicated because each company has a different matching threshold, which means that we might not see the full extent of the problem.

The hypothesis has been proposed that if multiple people share the same segment and they all match each other then they must all share a recent genealogical ancestor. It is then just a question of comparing family trees and trying to find surnames in common. If you match lots of people on the same segment this process of "triangulation" should in theory be fairly straightforward because it's just a question of looking for recurring surnames and locations in common, and the more family trees you have to compare the easier this should be. However, there are many pitfalls with this approach as we will see.

When I look at my own data I can see that on every chromosome I have at least one segment where I appear to match multiple people. This scenario is best visualised by using Don Worth's Autosomal DNA Segment Analyser, a free utility which is available from the DNAGedcom website. ADSA uses the "in common with" files from Family Tree DNA. This is not "true" triangulation, which requires checking that the people you match also match each other. However, it is in most cases a pretty reasonable approximation. The screenshot below shows the most extreme example from my own data which occurs on chromosome 18. As can be seen, I have a big group of people who all overlap in the same region, though the amount of sharing is quite small and in most cases under 10 cMs. (On the ADSA diagram below my mum's sharing is shown in black and my dad is shown in pink.)
However, I have not been able to identify a common ancestor or even a common surname with any of the people in my own "triangulated groups". It doesn't help that most of the people in these groups are in America with all-American ancestry and no indication of places of origin in the UK. I do have some known relatives who emigrated to America and Canada in the nineteenth and twentieth centuries but I have not come across any other emigrants further back in my family tree. You would have thought that if I was related to lots of people who had emigrated to America in the last 400 years it would have been possible to identify at least some of the connections. This rather suggests to me that if these segments are IBD, they are a signal of very distant shared ancestry that perhaps predates the colonisation of America. I have one triangulated group that looks distinctly Irish in flavour, and I do have some Irish ancestry but Irish research prior to 1800 is difficult at the best of times and especially so when the only surname you've identified is Sullivan!

My own "triangulated segments" are all quite small, but there is anecdotal evidence of people who have matches with large numbers of people on the same segments and where the shared segments are much larger than the ones I'm seeing. Some people have observed "triangulated groups" with segments over 20 cMs in size, which in theory should be more recent in origin and where it should be much easier to identify a common ancestor. But if it really is so difficult for three or more fourth, fifth or distant cousins to match on the same segment why is that we are seeing so many examples of this happening? I can offer a few suggestions that might help explain this phenomenon.  

Lack of phasing
The first difficulty when considering our matches is that our data at Family Tree DNA and 23andMe is not phased. Phasing is the process of sorting out the DNA letters we receive from our parents and assigning them to the maternal and paternal chromosomes. Our autosomal chromosomes come in pairs. We receive one set of 22 autosomes from our mum and another set from our dad. If you look at your raw data you'll see that for each chromosome you have a long list of As, Cs, Ts and Gs divided into two columns. However, the columns with our data for each chromosome aren't conveniently sorted so that all the DNA letters in one column represent all the letters you got from your dad and all the letters in the other column are the letters you got from your mum. The letters are all jumbled up so both columns are a mishmash of all the As, Cs, Ts and Gs that you get from both your parents. The computer algorithms are looking for consecutive runs of As, Cs, Ts and Gs that all match each other but they're looking in both columns to find the matches. If the algorithms find enough matching letters in a run then we can be reasonably certain that the segment is IBD – a true match inherited through successive unbroken generations from grandparent, to child to grandchild and so on. Strictly speaking what we are seeing are not segments of DNA but sets of alleles that form haplotypes.

Phasing matters most with the smaller segments under 15 cMs where there is a law of diminishing returns. As the segments get smaller the chances that the segments will be false positive pseudosegments (mishmashes of As, Cs, Ts and Gs from both the maternal and paternal chromosome) will tend to increase. Independent research from genetic genealogists suggests that 15 cM is the threshold where segments can be assumed to be IBD with reasonable confidence, whereas only 42% of 7 cM segments are likely to be IBD. Even when phasing is done there is still the possibility of false matches with the smaller segments. A study by Durand et al (2014) found that over 67% of phased 2-4 cM segments were false positives (matches found in the child but not in the parents).2

The most accurate phasing is done with parent/child trios. There are various computer programs and third-party tools that will do this (eg, the GedMatch tools and David Pike's tools). However, this sort of analysis is something that only the most advanced genetic genealogists are likely to undertake, and even if you were to phase your own data none of the companies currently provide the facility for you to use a phased genotype. It is also possible to do algorithm-based phasing from reference sequences, and this can be done with a very high degree of accuracy. AncestryDNA use this type of population-based approach, but they have developed their own sophisticated proprietary program. The error rate for the AncestryDNA phasing engine is only about 1% when compared with parent/child trios. However, phasing is computationally challenging and expensive, and AncestryDNA are currently the only company who are able to do this. In theory the matches we get from AncestryDNA should be much more accurate than the matches we get from 23andMe and Family Tree DNA.

It has been suggested that segments which "triangulate" must be IBD but I see no rationale for this assumption and, to the best of my knowledge, this hypothesis has not been tested. We already know that some small segments don't triangulate with close relatives. I have some examples in my blog post on Tracking DNA segments through time and space. If this can happen with small segments perhaps it can also happen with large segments too.

Genotypes versus whole genome sequencing
The second point we have to consider is that the currently available autosomal DNA tests are not sequencing all six billion DNA letters in our genomes. The testing is done on an Illumina chip which looks at around 700,000 different letters scattered across the genome.3 This process is known as genotyping. The Illumina chip that all the companies use was designed for health purposes and not for genealogy. The SNPs that are included are those that are useful for genome-wide association studies (GWAS) where the goal is to look for SNPs shared at the population level not SNPs that are shared at the family level. The density of SNPs on the chip varies and some regions are better covered than others. Segments containing rare alleles are much easier to identify than segments with alleles which have a high frequency in the population. The segments that are used for matching purposes in autosomal DNA tests do not therefore provide a complete sequence of all the letters in the "segment" but merely a run of consecutive SNPs with many missing intervening letters. This introduces the possibility of errors, particularly for shorter segments. In addition, two separate segments could be stitched together to give the appearance of one single segment because the intervening SNPs that might break up the sequence aren't on the chip.

Shared descent through multiple ancestral pathways
We saw in my previous blog post how pedigree collapse and endogamy can affect relationship predictions within recent generations but pedigree collapse and endogamy affect all our family trees sooner or later. We are all endogamous. It is just a matter of degree. The number of ancestors doubles with every generation. You only have to go back 20 generations before you find that you theoretically have 1,048,576 genealogical ancestors. You eventually reach a point where your theoretical number of ancestors exceeds the entire number of people who have ever lived on the planet. We have a world population of over seven billion people but we all trace our ancestors back to a historical population of just one billion in 1850.

What this means in practice is that everybody is related to everybody else and we are all related much more recently than we intuitively realise. For many people this endogamy will not be documented in their family trees. The only example I can find in my own family tree of two ancestors who were already related when they married dates back to the late seventeenth century in North Molton, Devon. My ggggggg-grandparents Daniel Locke and Mary Bright were first cousins when they married in 1667. However, even though I cannot trace all the distant relationships it is an escapable fact that all my ancestors who were marrying in rural villages in Devon, Somerset, Gloucestershire, Essex and Hertfordshire back in the 1700s must have been closely related to each other and were probably third, fourth, fifth and more distant cousins many times over. I also have lots of London ancestors and they would be more distantly related because of the sheer size of the London population and the fact that people migrated to London from all over Britain and elsewhere. Eventually all our ancestral lines will come together in a tangled and complex network of relationships connected on many different pathways. To get an idea of what our collapsed pedigrees might look like have a look at this wonderful 80-generation pedigree chart for a border collie dog showing 90% pedigree collapse.

Two peer-reviewed papers studying present-day populations have confirmed the mathematical predictions of  our ubiquitous recent shared ancestry. Henn et al (2012) found tens of thousands of 2nd to 9th degree cousin pairs within a dataset of 5,000 Europeans. They also found that some highly endogamous populations such as Native Americans and the Kalash of Pakistan were effectively the genomic equivalents of second cousins.4 Ralph and Coop (2013) studied genomic data for a population of 2,257 Europeans. The found that "a pair of modern Europeans living in neighboring populations share around 2-12 genetic common ancestors from the last 1,500 years, and upwards of 100 genetic ancestors from the previous 1,000 years".5

These findings have also been replicated by AncestryDNA who found that their customers who are in DNA Circles were getting roughly the expected number of matches four and five generations ago with third and fourth cousins but progressively more matches than would be expected  with fifth and sixth cousins six and seven generations ago.
Figure reproduced by kind permission of AncestryDNA from the customer help article
 "Why do DNA Circles only go back six generations?"
AncestryDNA conclude:
Our research shows that descendants of an ancestor who lived more than six generations ago have more DNA in common with other descendants of that ancestor than they’d be expected to. This discrepancy increases the more generations you go back in time and suggests that descendants are actually related through multiple ancestors.6
If researchers are seeing such relatively high levels of IBD sharing in the present-day population we can assume that 300 years ago, when we trace back to a very much smaller population, everyone must have been much more closely related than we are today. We don't have a time machine so that we can travel back to the 1700s and get autosomal DNA tests done on all 1024 of our gggggggg grandparents. However, if we could, we might expect that a very high percentage of our ancestors would show up as matches to each other with many of the relationships being as close as fourth, third or second cousins. The effect would be compounded by the fact that the ancestral population of Europe went through an extreme genetic bottleneck in the fourteenth century when the Black Death wiped out over half of the population of Europe. The cumulative effect of this population structure is that the genomes of our ancestors 300 years ago would perhaps have the same characteristics as that of a highly endogamous population today. They would share more segments in common than would be expected for the degree of relationship and, if many of those ancestors were second or third cousins, then a number of them might be expected to match on the same segment. If there were lots segments shared by many people in the historical population then it's easy to understand how these segments could also be found in their descendants today, but these segments would be passed on through a variety of different pathways, making it very difficult, if not impossible, to determine the individual lines of descent.

As an example, if you have ancestors in the 1700s, A, B, C, D and E, who all share the same 8 cM segment there are five possible pathways in which that segment could have been passed on to you. You might match a cousin who has Ancestors A, F, G, H and I in her tree who are similarly all related and share that same 8 cM segment. She too has five possible pathways in which that segment could have been passed down to her. It may be that you can both identify Ancestor A in your genealogical trees, and you assume that you are genetically related because you both share descent through ancestor A. You each have a 1 in 5 chance of inheriting that segment from Ancestor A, but there is only a 1 in 25 chance that you have both inherited the same segment from Ancestor A. The most probable scenario is that you have both inherited the segment from different ancestors and neither of you has inherited the segment from Ancestor A. You will share a common genetic ancestor but that ancestor will be the progenitor of ancestors A, B, C, D, E, F, G H and I and not Ancestor A, and might well be beyond the reach of genealogical records.

It therefore seems likely that if lots of people match on the same segment this indicates that the segment is prevalent in the population from which they descend as a result of historical endogamy rather than an indication that they all share that same segment from a recent genealogical ancestor within the last 300 years or so. Indeed many of the tools produced by population geneticists use haplotype frequency as a way of detecting IBD. For example, Browning and Browning (2011) say “Haplotype frequency is critical because a shared common haplotype is unlikely to reflect recent IBD, whereas a shared haplotype that is very rare is likely to be identical by descent”.7

Pile-up regions
In addition to the problem of historical endogamy, which makes it very difficult to infer distant relationships with the currently available tests, it is also known that there are some regions of our genome which are prone to what the population geneticists call "excess IBD sharing" or what are more colloquially known as pile-up regions. These are segments which are widely shared at the population level. For a summary of some of the research into this subject see the section on excess IBD sharing in the ISOGG Wiki page on identity by descent. AncestryDNA use a proprietary algorithm known as Timber to filter out segments which occur at high frequency in the database. In some cases AncestryDNA found segments that were shared by thousands of people which suggested that these people weren't recently related to each other but shared DNA because they were descended from the same gene pool. There was not a direct correlation between the size and frequency of a shared segment and some of the segments that were filtered out using this method were quite large. See the blog post from Julie Granka on Filtering DNA matches at AncestryDNA with Timber which includes a table showing the size of the segments that were removed by this process. There is always going to be a trade off between false positive and false negative matches and no algorithm is perfect. Some genetic genealogists who have tested parents and children at AncestryDNA have reported that up to 35% of their child's matches, including some fourth cousin matches, do not appear in the match list of either parent. This discrepancy has not yet been explained but appears to be related to the use of the Timber algorithm.

Conclusion
Autosomal DNA testing for genetic genealogy is still very much in its infancy, and we clearly have a lot to learn about the interpretation of results, particularly for endogamous communities and for the more distant relationships beyond the fourth or fifth cousin level where family trees start to get very patchy and where relationship predictions become more difficult. The lack of phasing at 23andMe and Family Tree DNA means that our matches, particularly those with the smaller segments, are unreliable and there are both false positives and false negatives. Many of the ambiguities in our results would disappear if we were to move to whole genome sequencing, and preferably with phased genotypes too. That is unlikely to happen in the next few years but will no doubt be routine at some point in the not too distant future.

In view of the known levels of historical endogamy in the human population and the almost impossible mathematical odds of multiple fifth and sixth cousins matching on the same IBD segment through descent from the same ancestral couple, I would suggest that any segment that triangulates with multiple distant cousins is unlikely to be indicative of a recent genealogical relationship. The common ancestor will probably have lived much further back in time and may well be beyond the reach of genealogical records. Our focus should perhaps instead be on all the rare haplotypes in our match list  the segments that we share with just one of our distant matches. I hope that it might be possible to find a way of testing some of these competing hypotheses.

In the meantime, it's very important that we don't jump to conclusions based on patterns seen in the data. In science and in genealogy it is important to look not for evidence that will prove your hypothesis but for evidence that will disprove your hypothesis.

Related blog posts
Further reading
Notes and references
1.  AncestryDNA. Do all members of a DNA Circle share the same matching segment? An article in the AncestryDNA help menu "Learn more about DNA Circles" which is accessible to AncestryDNA customers only. 
2. Durand EY, Eriksson N, McLean CY (2014). Reducing pervasive false positive identical-by-descent segments detected by large-scale pedigree analysisMolecular Biology and Evolution 31(8): 2212-2222.
3. Both AncestryDNA and FTDNA test around 690,000 SNPs. The 23andMe v4chip has around 577,000 SNPs.
4.  Henn BM, Hon L, Macpherson JM, Eriksson N, Saxonov S, et al (2012). Cryptic distant relatives are common in both isolated and cosmopolitan genetic samplesPLoS ONE 7(4): e34267. See also the blog post from 23andMe How many relatives do you have? summarising the findings of this paper.
5. Ralph P, Coop G (2013). The geography of recent genetic ancestry across Europe. PLoS Biol 11(5).
6. AncestryDNA. Why do DNA circles only go back six generations? From the AncestryDNA help menu "Learn more about DNA Circles" which is accessible to AncestryDNA customers only.
7. Browning BL and Browning SR (2011). A fast, powerful method for detecting identity by descent. American Journal of Human Genetics 88(2):173-82. See also: Browning SR, Browning BL (2012). Identity by descent between distant relatives: detection and applications. Annual Review of Genetics 2012; 46: 617-33. In the latter article the authors state: "The key idea behind IBD segment detection is haplotype frequency. If the frequency of a shared haplotype is very small, the haplotype is unlikely to be observed twice in independently sampled individuals, so one can infer the presence of an IBD segment. This criterion can be applied in several ways. The first is length of sharing, which is a proxy for frequency. If two densely genotyped haplotypes are identical at all or most (allowing for some genotyping error) assayed alleles over a very large segment of a chromosome, then the haplotypes are likely to be identical by descent across the whole segment. The second is direct use of haplotype frequency: Shared haplotypes with estimated frequency below some threshold are determined to be identical by descent. The third makes use of a population genetics model to infer probability of IBD. Given the frequency of the shared haplotype and a probability model for the IBD process along the chromosome, one can estimate the probability that the individuals are identical by descent at any position on the segment."

© 2016 Debbie Kennett

22 comments:

Building Magic said...

"In science and in genealogy it is important to look not for evidence that will prove your hypothesis but for evidence that will disprove your hypothesis."

We should apply the same skepticism to AncestryDNA's numerous unverified claims.

Debbie Kennett said...

Jason

What are the "numerous unverified claims" that you have in mind? As I've stated in my blog posts I would like to see AncestryDNA publish a paper on the David Speegle research. I would also like them to let us have the matching segment data so that we can independently verify their predicted matches. To their credit, AncestryDNA have been quite transparent about their methodology and have published three very interesting White Papers.

Jerry E. said...

Debbie,

First, thank you very much for the informative articles. The content of the articles explain why for the last several months I have been unable to find a MCRA between approximately 15 people that all match on a particular segment of Chromosome 2 of my mother's DNA. The common matches all overlap on this same segment and measure between 15 and 33 cMs. I have ran the triangulation tool in Gedmatch and it confirms that all of the matches should have a MCRA. There are at least a half dozen people on FTDNA that have not uploaded their DNA results to Gedmatch that also match on this same segment. I have contacted a majority of the common matches and reviewed their family trees; however, I have yet to identify a MCRA. A number of the common matches have been categorized as third or fourth cousins by both Gedmatch and FTDNA. Most of these common matches that I could find on Ancestry are categorized as distant cousins with one categorized as a fourth cousin. Two of the families that are common matches have identified their MCRA and determined that they are sixth cousins. When I run the various tools on Gedmatch, the results indicate that these individuals are third cousins. It turns out that their MCRA born in the 1700s had two sons, one of whom had four daughters that married four brothers of the other son (very large families). In turn, it appears that several other ancestors of the MCRA were related. What makes it more interesting is that I was able to identify two MCRAs with one of the matching families on my father's side!

Debbie Kennett said...

Jerry

If the people in your "triangulated group" are matching on segments of different sizes then you have to remember that those segments will all have a different TMRCA. The fact that segments overlap is irrelevant. In general, people who match on a 7 cM segment will have a more distant common ancestor than people who match on a 33 cM segment. How many people share the 33 cM segment?

Unknown said...

Debbie,

I really appreciate the enormous amount of work that must have gone into these posts. I can see myself revisiting them repeatedly in the coming days as I re-evaluate my results in the light of your ideas.

Nicola

Debbie Kennett said...

Thank you Nicola. It took me a while to compile the two blog posts but I always find that writing help me to clarify my thinking. Hopefully the posts will start a debate on what we can can't do with autosomal DNA tests.

Jerry E. said...

Debbie,

There are five people that share the 33 cM segment, four of whom are from the same family (mother, two children, and one granddaughter). The fifth person is adopted and does not know her biological parents' names. I understand the comment on TMRCA. However, a brother and sister of another family who share a 24.6 cM segment are within a year or two in age with the mother of the first family and are sixth cousins.

Jerry

Jerry E. said...

Debbie,

I forgot to mention my mother and myself also match on the 33 cM segment of Chromosome 2.

Jerry

Debbie Kennett said...

Jerry, If you've got five close relations from the same family then you would expect them to have sizeable segments in common. If you look at the chart from Ancestry you'll see that they found that a segment was shared by three first cousins 80% of the time. If you have brothers and sisters, grandparents and grandchildren matching on the same segment you should consider them as one family unit. They form just one angle of the triangle. This is a really good solid match that the adoptee can work with.

The problem I'm discussing is when people who are distantly related to each other, eg, five sixth cousins, find that they all share a segment in common. If this happens it's far more likely that they share descent through multiple distant ancestral lines rather than all sharing descent from a recent genealogical ancestor.

Ann said...

Debbie,

Thank you for writing your articles. If I understand what you are saying correctly I shouldn't be surprised by the lack of other known cousins not matching a 6th cousin.

A little back ground. My mom, sister, aunt & myself match a distant cousin and his mom. The relationships are, the two moms & my aunt are 5th half cousins once removed and the moms & aunt to the children are 6th half cousins. The segment is 19.3 cM's and contains 3395 SNP's. I am in the US and the match is in Germany. There are two other segment matches one appears to be a false positive. My mom & aunt match the son but not the mom in Germany the segment is 9.2 cMs with 1000 SNP's. The other segment my aunt matches both the son & mom in Germany but the segment is only 8.7 cMs and 1500 SNPs.

Both of us can trace are lines back to Maria Anna Sing baptized 26 August 1738 in Germany. She married three times the first to Joseph Sing baptized 22 May 1732. I suspect that Maria Anna & Joseph were related but I don't know how. I know who Maria Anna's & Joseph's grandparents are but I haven't found a connection between them. My mom descends from Maria Anna & Joseph. Our cousins in Germany descend from Maria Anna & her third husband Anton Braun. My cousins line stayed in the same town until the birth of his grandmother, while my line, the son of Maria Anna & Joseph moved to another town. His daughter had an illegitimate son who went on to have an illegitimate daughter who was my mother's great grandmother she immigrated to the US. I have traced the illegitimate lines and so far they don't intersect with the Sing lines.

I have tested a nephew of my mom and aunt and a second cousin of my mom & aunt who do not match the cousins from Germany. My initial thoughts were that I need to find another cousin that matched on the 19.3 cM segment before I could be confident in the match.

Ann

Debbie Kennett said...

Ann

I am indeed saying that you should not be expecting other more distant cousins to match a sixth cousin on the same segment. If other distant cousins do show up as a match with any of your family members you would expect them to match on a different segment, but the chances of a second sixth-cousin match are very low anyway. You would expect you, your mum, your aunt and your sister to share the match. As you are all very close family members you represent one family unit which is essentially one angle of the triangle. The non-match with your mum’s nephew and second cousin are again what we would expect because of the low chances of sixth cousins sharing any detectable DNA.

The segment that matches the son and not the mum does appear to be a false positive. The 8.7 cM segment could be legitimate but it’s difficult to tell without phasing. Bear in mind that we wouldn't normally expect sixth cousins to match on more than one segment. However, there is still much we don’t know about the historical levels of endogamy in different populations.

I would say that at this stage the match is strong enough for you to say that you have confirmed that you and your family are genetic cousins with your matches in Germany. The fact that Joseph and Maria Anna were probably related to each other when they married increases the probability that their DNA would have survived to the present day, but also makes it more difficult to work out the correct line of descent. You also have to bear in mind that all of our ancestors in the 1700s would have been much more closely related to each other than we are today. You therefore need to keep an open mind that it’s also possible that the relationship is through another pathway.

These distant relationships are very difficult to confirm with the current tests but you’re doing the right thing by getting lots of family members tested. There will come a time when we can use whole genome sequencing to detect relationships. That will allow us to detect SNPs that are specific to individual family lines and will help us to confirm these types of relationships with more confidence.

Anonymous said...

Hi Debbie,
Thanks for all your great research and blogs. This makes a lot of sense.

Unfortunately, I am having trouble reconciling what you write with the work I have done with 26 DNA testers descending from the Frazer Family in North Roscommon in the early 1700's. I have split this group into 2 Frazer lines dating from the early 1700's. One of the lines has no TGs. In addition, the atDNA matches have been low compared to what might be expected. The other line had 4 TGs as of last week and another one as of last week. I wrote a blog about the 4 TGs last week as one of our testers was in all 4 TGs. She is now in the 5th also. 11 out of the 13 testers are in the 5 TGs. One that isn't in a TG, her mother is. Here is the blog: http://www.jmhartley.com/HBlog/?p=577
Based on your conclusions as I understand them, the CAs should be further back in time. When I first started looking into these TGs I was back another generation, then I realized based on the people in the TGs and the genealogies, that the matches should be more recent.

Joel Hartley

Debbie Kennett said...

Ho Joel

Thanks for sharing your blog post. I can see that you have identified a number of cousins who all have Frazers in their family tree and who all appear to match each other. How have you determined that the Frazer line is the only possible pathway by which these people can all be related? Do all these cousins have complete family trees for all their family lines going back to the 1700s? Are their trees all in Colonial America where you have endogamy clouding the picture? Each matching segment of a different size needs to be considered independently. Any segment will be a mixture of smaller more ancient segments. An 8 cM segment (if it is genuine and survives phasing) is likely to have a much higher frequency than a 40 cM segment and is therefore likely to be much more distant.

Anonymous said...

The Frazer Line is not the only pathway by which these people can be related. However, as all these people have Frazer ancestors, it is the most likely way and probably the most direct.

They do not have complete lines going back to the 1700's. But they have pretty good lines for Ireland. The family had genealogists that researched and wrote down much of the Irish Frazer family history about 60 or 70 years ago.

I have some colonial ancestry, but my connection is not through my colonial side. I have mapped out my grandparents, DNA based on a technique that Kathy Johnston uses. This way I can tell why only one of my sisters is in one of the particular TGs. It is because in that spot, she got her DNA from her paternal grandmother (who was a Frazer) and my other sister and I got their DNA from the paternal grandfather at that segment's location.

My Frazers came to Massachusetts from Ireland in the late 1800's. 3 of the people in the TGs are myself and my 2 sisters. Another is my 2nd cousin once removed. One other person in the TGs is from SW United States. Her Frazer ancestor moved from Ireland to Scotland before moving to the US. 2 are second cousins from Canada. Their Frazer ancestors emigrated to New York State in the mid 1800's before that. 2 people in the TGs are from England and 2 are from Australia.

I'm not sure that the match with the smaller cMs (BR) is further away. On paper, he is a closer genealogical match than the larger 40-50 cM matches. He has a lot of smaller matches with a lot of people as he is in 3 different Frazer lines. I am also in 3 lines. I have identified 2 of the lines and for the 3rd, I have guesses, but I haven't been able to place the line as it is an older maternal line.

Joel

Debbie Kennett said...

Joel, I think you're on very solid ground using Kathy's methodology to map the DNA of your grandparents so that you can determine which grandparent's line your matches are on. You would expect to have some second and third cousins in your triangulated groups. However, as I explained in the first of my two articles in this series, it's very rare to find three fourth cousins matching on the same segment if they only share descent from a *single* ancestor. It's much more likely that three fourth cousins will match if they all descend from the same gene pool and have *multiple* ancestors in common but this would mean that they're most likely to get their DNA matches through different pairwise comparisons on a variety of different lines, as in the A, B, C, D, E analogy I used in the above blog post. You seem to have a very complicated scenario with lots of intermarriages between your Frazers, including I note from your blog post, one cousin marriage. This is going to make it very challenging to determine the origin of any given segment beyond what you've determined from your careful mapping of your grandparents.

The difficulty with the GedMatch triangulation tool is that you are restricted to your top 300 matches in the GedMatch database. When you're looking at an 8 cM segment you could conceivably have hundreds of matches on this segment, but most of them won't be visible to you. You could also potentially have many more matches on 8 cM segments at FTDNA but most of them aren't reported because of the 20 cM threshold. The size of a segment is just one way of trying to determine IBD, but segment frequency is also something that needs to be taken into consideration. High-frequency segments are indicative of population sharing and are not of genealogical relevance.

Irish family history research is not easy. Most Irish people that I know get stuck on their trees in about 1850 or so though it should be easier now that so many more Irish parish registers are coming online. Once you get back into the Irish countryside you find that the families all tend to be very inter-related.

Jo Henn said...

Interesting and helpful post. Thank you for sharing it. I wanted to let you know that I have included it and the first part of the series in my NoteWorthy Reads post: http://jahcmft.blogspot.com/2016/03/noteworthy-reads-26.html

Louis Kessler said...

Debbie,

Great article. Just one quibble. You said: "It has been suggested that segments which "triangulate" must be IBD but I see no rationale for this assumption and, to the best of my knowledge, this hypothesis has not been tested. We already know that some small segments don't triangulate with close relatives. I have some examples in my blog post."

Close relatives don't necessarily have to triangulate on a segment. If that segment was not passed down to one of the three, then they won't.

But the converse is always true. If three people all match each other on a segment, then they have a common ancestor. The 3-way match makes the probability of them being IBS far too low, so they must be IBD or IBP.

Louis

Debbie Kennett said...

Segments do not skip a generation so if a child and his parent share a segment with a cousin then the child's grandparent on that same line should also share the same segment. I have examples where segments that match with a cousin and a parent don't match the grandparent.

When working with unphased data it's much easier to have coincidental false positive matches because you have two DNA letters to choose from at each position rather than just one. If it's easier to get false positive matches with two people then in theory it would also be easier to get false positive matches with three people. To find the answers we need someone to do a study of identified triangulated groups and see what happens when the data is phased. To my knowledge no one has as yet attempted this.

As a minor quibble I don't like the use of IBP. IBS is the scientific term used to describe all matches which aren't IBD. Some of these matches are IBS because they are shared at the population level and some are IBS for other reasons. False positive matches as a result of lack of phasing aren't IBS because the genotypes aren't identical - it's just that we haven't sorted them into paternal and maternal chromosomes.

Louis Kessler said...

"I have examples where segments that match with a cousin and a parent don't match the grandparent."

Debbie. How do you explain that?

Debbie Kennett said...

Louis

These are false positive matches. You have to remember that at Family Tree DNA, 23andMe and GedMatch we are dealing with unphased data. With long matching segments the lack of phasing doesn't matter, but it makes a huge difference on segments under 15 cMs in size. See the ISOGG Wiki page on identical by descent and in particular the section on false positive matches:

http://www.isogg.org/wiki/IBD#False_positive_matches

Michelle said...

Debbie,

You write "ADSA uses the "in common with" files from Family Tree DNA. This is not "true" triangulation, which requires checking that the people you match also match each other. "

Can you explain further? If you match Bob and Alice, and Bob is ICW with Alice, doesn't that mean by definition that Bob and Alice appear in each other's match lists? I realize that Bob and Alice might share a segment(s) that is different from the ones you share with each of them.

Great post, hope you can clarify this point.

Debbie Kennett said...

Michelle

Blaine Bettinger has provided a good explanation of "true" triangulation in this blog post:

http://thegeneticgenealogist.com/2016/06/19/a-triangulation-intervention/