Cruwys news: Comparing parent and child matches at AncestryDNA

Sunday, 6 August 2017

Comparing parent and child matches at AncestryDNA

A number of genetic genealogists have done comparisons of parent and child matches at AncestryDNA to see how many of the smaller matches do not match either parent:

Ann Raymont published an article in July 2016 When is a match a false positive? She found that 35.3% of her matches did not match either parent.

Blaine Bettinger wrote a blog post on The danger of distant matches back in January this year. He found that 32% of his matches were not shared with either of his parents.

Kevin Ireland published his results in an article entitled atDNA case study: two parents and one child. He found that 18.5% of his matches did not match either parent.

Karin Lovisa Borgerson has this week published a blog post Baby versus bathwater: distant matches. She found that 17% of her matches were not shared with either parent.

I recently tested both of my parents at AncestryDNA and I thought it would be an interesting exercise to do a detailed analysis of my own matches to see how my results compare with the other studies. I used the DNAGedcom Client to download my matches from AncestryDNA. I used the Match-O-Matic tool, which is included with the Client, to analyse my matches. See the methodology section below for details of how the analyses were done.

My matches at AncestryDNA
I tested on the AncestryDNA v1 chip in June 2012. I currently have 10,232 matches at AncestryDNA.

Using the same categories as Blaine Bettinger my matches break down like this:

Category	Number	Percentage
Over 50 cMs	9	0.08%
25 cMs or more	39	0.38%
20 cMs or more	71	0.69%
15 cMs or more	251	2%
10 cMs or more	1403	14%
Fewer than 10 cMs	8837	86%
6-7 cMs	4512	44%

Two of my matches in the over 50 cMs category are my parents. All the other matches have tested independently at AncestryDNA.

Sharing with my parents at AncestryDNA

My parents were tested on the AncestryDNA v2 chip in June 2017

My dad has 8350 matches at AncestryDNA.

My mum has 11285 matches at AncestryDNA.

Here are my findings after comparing my matches with my mum and dad:

3299 (32%) of my 10232 matches are shared with my dad
3276 (32%) of my 10232 matches are shared with mum
20 (0.2%) of my matches appear on the match lists of both my mum and my dad.
3671 (36%) of my matches do not appear on the match lists of either of my parents.
Of the 3671 matches which do not match either of my parents 3559 (97%) shared a single DNA segment and 112 (3%) shared 2 segments. Ninety-six (86%) of the two segment matches shared less than 10 cMs, and 16 (14%) shared between 10 and 16 cMs.

I divided the matches into "bins" to see what the match rate was for different levels of sharing. The results are shown in the table below.

cM bins	Total matches	Total matching a parent	% matching a parent	Total matching neither parent	% matching neither parent
50 cMs +	9	9	100%	0	0%
40-50 cMs	1	1	100%	0	0%
30-40 cMs	10	10	100%	0	0%
20-30 cMs	51	51	100%	0	0%
19-20 cMs	25	25	100%	0	0%
18-19 cMs	18	17	94%	1	6%
17-18 cMs	30	30	100%	0	0%
16-17 cMs	46	45	98%	1	2%
15-16 cMs	61	58	95%	3	5%
14-15 cMs	96	86	90%	10	10%
13-14 cMs	148	134	91%	14	9%
12-13 cMs	180	161	89%	19	11%
11-12 cMs	291	274	94%	17	6%
10-11 cMs	437	386	88%	51	12%
9-10 cMs	799	679	85%	120	15%
8-9 cMs	1275	1001	79%	274	21%
7-8 cMs	2241	1534	68%	707	32%
6-7 cMs	4512	2058	46%	2454	54%

Double matches

I also took a look at the 20 matches that my parents shared with each other. My parents do not appear as matches to each other and do not have any identifiable common genealogical ancestors. I also checked to see if these shared matches appeared on my match list. Here is the breakdown:

Match	cMs shared with Dad	cMs shared with Mum	Match to Debbie
Match 1	8.7576	7.9563	Yes
Match 2	9.2653	6.984	Yes
Match 3	9.3549	7.363	No
Match 4	6.082	7.5295	Yes
Match 5	6.4811	8.0814	No
Match 6	9.3519	7.4345	Yes
Match 7	27.0902	8.1206	No
Match 8	10.009	14.0302	Yes
Match 9	6.5761	9.6979	Yes
Match 10	6.262	7.8819	No
Match 11	6.4201	7.5106	No
Match 12	7.9815	6.1787	Yes
Match 13	9.2775	6.3962	No
Match 14	11.4808	6.3302	Yes
Match 15	7.26	6.1234	Yes
Match 16	6.8033	7.0212	Yes
Match 17	8.8658	6.2839	Yes
Match 18	7.1988	6.0201	Yes
Match 19	6.5682	8.9985	Yes
Match 20	10.7345	14.5475	Yes

Discussion

Although 36% of my matches did not match either of my parents, the headline figure is not as gloomy as it might at first appear. The vast majority of these non-matches were on small segments under 10 cMs, and the lion's share of non-matches were on the very tiny segments under 7 cMs.

All matches sharing 19 cMs or more were shared with one of my parents so this can be considered my personal safe zone where matches are guaranteed to be valid.

There were just four out of my 155 matches sharing between 15 and 19 cMs which did not match one of my parents. The largest non-shared match was 18.2 cMs. This means that 87.5% of my matches in this range were valid.

Below 10 cMs the chance of not sharing a match begins to increase exponentially. With the very smallest matches sharing just 6-7 cMs only 46% matched one of my parents.

Matches that do not match my parents are either false positives, which means the matches are not real matches, or false negatives, which means that the match is not showing up in the match list of my parent for one reason or another. However, without further investigation it is not possible for me to determine whether these matches are false positives or false negatives. This can only be done by careful chromosome mapping and by testing multiple close family members.

AncestryDNA use a phased matching technique. Phasing is the process of assigning individual alleles to the maternal and paternal chromosomes. A lack of phasing results in many false matches. These are sometimes known as pseudosegments. For a discussion of the reasons for these false matches see the ISOGG Wiki article on identical by descent. Because AncestryDNA use phasing they are able to deliver matches on smaller segments than the other companies. While phasing provides more accurate matches, the process is not without its problems. One of the limitations is that we are not tested on our whole genome but rather a sampling of markers scattered across our genome. If matching were to be done on the whole genome we would no doubt find that many of our matches are not valid after all. A second problem is that the phasing algorithms are not perfect. Sometimes they break up a longer match into smaller segments. There is also a problem of what are known as phase switch errors, when the phase accidentally switches from the maternal to the paternal chromosome or vice versa.

I am fortunate that I've been able to test both my parents which allows me to do a sanity check on my matches. However, if you are not able to test your parents you will have no way of knowing which of the small segment matches are likely to be valid. It was also interesting to note that some of my matches were shared by both my parents. If I hadn't tested both my parents or if I'd only tested one of my parents I could easily have been led astray with these matches.

Matches at Family Tree DNA and 23andMe are not phased so the false match rate is going to be even higher there.

Even if these small segments are real, the odds are still stacked against the match falling within a genealogical timeframe. We know from computer simulations that over 60% of 10 cM segments are likely to trace back beyond ten generations. This does of course also mean that 40% of 10 cM matches are likely to fall within the last ten generations, but computer simulations have the advantage of working in an idealised world where every segment can be reliably attributed. In real life it is much more complicated and with the current matching algorithms there is an additional risk that these segments will not be accurately identified.

The genetic genealogy community has known for a long time the problem of using small segments in genealogical research, and my findings simply add to the existing evidence base. I already have 1773 matches at AncestryDNA that share 10 cMs or more. I can still only find genealogical connections with a handful of those matches. There is really no reason to get down in the weeds with these small segments under 10 cMs.

Methodology

I am using version 1.5.1.3 of the DNAGedcom Client with a PC running Windows 7. The DNAGedcom Client is a subscription service costing $5 a month. The Match-O-Matic tool is included in the subscription. Match-O-Matic was designed for a Mac but converted to a Windows format by Rob Warthen for use in the DNAGedcom Client. For details of the DNAGedcom Client and Match-O-Matic see the user guide.

I downloaded my match lists into Excel spreadsheets using the DNAGedcom client on 4th and 5th August.

I used the Match-O-Matic tool provided with the DNAGedcom client to analyse my matches.

To see how many matches I shared with my parents I used the report labelled Matches in common (matches in both files) [ICW] to combine the match lists for my mum and dad..

To see how many matches did not appear in either of my matches lists I then used the report labelled Combine files (all matches without duplicates) [ALL]. In order to get the program to work correctly I renamed the output file with the prefix m_.

I used the report labelled Matches in A that are not in b [ANB] to extract a list of matches that were in my match list but were not in the combined match list of my parents.

Acknowledgements

Thank you to Rob Warthen for developing the DNAGedcom Client. Thank you to Don Worth for developing the Match-O-Matic. Thank you to Richard Weiss for advice on using Match-O-Matic.

Update
Since publishing my blog post Alex Coles has also done an analysis of her parent and child matches at AncestryDNA. Alex and her parents all tested on the v1 chip. Alex found that 31% of her matches did not match either parent. All the non-matches were below 17 cMs apart from one intriguing outlier. Read Alex's article Imprecise science. Part 1 AncestryDNA on her Winging It blog

Related blog posts

16 comments:

Jackson said...: Really thorough discussion --thanks!; 8 August 2017 at 11:42
Alex Coles said...: As Jackson said, this is a very thorough discussion, thanks for sharing your results. I was inspired to run a similar process over my own matches, and ended up with fairly similar results - see http://wing-ops.blogspot.co.nz/2017/08/imprecise-science-part-1-ancestrydna.html.; 10 August 2017 at 06:44
Debbie Kennett said...: Thank you Alex for doing this analysis. I've added a link to your article in my blog post in a section at the end on updates. I hope that other people will also rise to the challenge. I still don't have any sense of the false positive versus false negative rate in these non-matches. I suspect that more of them are false negatives. However, this exercise has highlighted the need for caution when working with small segment matches and we need to factor in the uncertainty when drawing conclusions about genealogical relationships.; 10 August 2017 at 12:48
Kerrie Anne Christian said...: thanks for referring me to this article - very interesting and informative; 25 August 2017 at 05:08
Bob Braxton said...: A reader who seemed mildly annoyed with me pointed me to this excellent post which I greatly appreciate. Personally I view my exploration less as "conclusions" and more as "process" and so do not, up front, exclude contacting anyone by trying to ferret out the false negative / false positive. I am 73 years old and got FT-DNA family finder results 13 months ago July 15 after testing early April, 2016. I have not unlocked a major "knot" on paternal (who is father of my grandmother Flossie Emma) and maternal (who is father of my 3rd highest "half-aunt" surprise match or, conversely, who is father to my own mother - if not the one I have always known as grand-daddy). As much as I can, I let the DNA lead where it may rather than denying a connection. My saying "DNA doesn't lie - except when it does" - reflects that especially in the "small potatoes" DNA is not totally exact science; however, based on the excitement of those who were adopted when they find family (biological), it seems light years better than paper trail only of the past before DNA discovered.; 25 August 2017 at 06:29
Debbie Kennett said...: Robert, I think most people tend to regard the research as an ongoing process. However, I do sometimes see people writing blog posts claiming that they have proven a particular connection based on small segments. In every case they have failed to consider any alternative hypotheses.

DNA is great for providing clues and with close relations it doesn't lie. However, the interpretation of DNA evidence is not always so straightforward especially once you get beyond the second cousin level. It is easy for people to jump to false conclusions if they don't consider all the possibilities. Now that the databases are so big, most people should have at least a few high-quality matches that they can work with.; 25 August 2017 at 13:32
Bob Braxton said...: Less than 14 months ago we began quest to find potential (half-) sister who appears as infant "ward" in 1940 census. My parents had eight (one has died,I am firstborn) and this would potentially be a half-sister - we have not yet found. What I am finding: my paternal grandmother reared as surname WRIGHT has a half-sister Rachel PERRY. My DNA matches a male PERRY Jr. at 2nd to 4th cousins; but for myself it is about relationship, not simply proof - so I am dependent upon the kind of reponse (or not) in reaching out. Also what I was not looking for, surprise on my maternal line - even closer match: my 3rd highest, female who grew up in Iowa (I grew up in NC) - the only higher matches are my one full brother and a brother of my mother. This DNA match is at the level of half-aunt. I follow the DNA hints as well as high match while also using the full range of FT-DNA suggested matches as clues. To me it would be a mistake to decide in advance which ones NOT to pursue. When I chose to test DNA, I made the choice to be flexible and to be open - since if what I already knew was not shakeable there would have been no best reason to have tested DNA to begin with, in my view.; 25 August 2017 at 13:45
Debbie Kennett said...: Robert, It's great that you've made so much progress with your research. I find the best strategy is to focus on people who share the surnames and who have ancestry from the same locations. I only have two shaky leaf hints at Ancestry and no DNA Circles so I have to work with what I've got. All clues are worth following up, but we just need to exercise caution with the matches on smaller segments and recognise that many of them might not be legitimate matches.; 25 August 2017 at 14:13
Unknown said...: Debbie, has anyone tried painting the father and mother matches onto the tested parts of the child genome, to see where there are gaps (which is presumably where the fake matches are coming from)?

Can we guess how much of the total test area is not inherited, and - with more data - will we find any patterns as to which areas are more likely to have these random segments?; 16 October 2017 at 16:02
Debbie Kennett said...: At AncestryDNA we don't have access to segment information so we have no way of knowing where the mismatches are occurring. I wonder if the problem is not that there are gaps but that these matches are occurring in SNP-poor areas and the segment is not one continuous7 cM segment but perhaps two 3.5 cM segments broken up by other alleles in the middle which aren't on the chip.

Rebekah Canada did a very interesting series looking at the SNP density on the different chips:

http://haplogroup.org/exploring-microarray-chips/; 22 October 2017 at 00:30
Kent J said...: Thank you for your post it is very informative. You can also make these comparisons with an EXCEL macro which you can download for free at https://sortingdnacousins.blogspot.com/p/blog-page.html . It would be interesting to see what your updated FTDNA results might show.; 14 December 2017 at 16:27
Debbie Kennett said...: Thank you for the link to the Excel Macro. I didn't know about that tool. I shall experiment when I have a bit more free time.; 14 December 2017 at 16:45
Kent J said...: I appreciate your testing your matches. i was not able to test my parents, so thank you.

The Macro should be easy to use. You put one match list in the second worksheet tab and the other match list in the third tab. The match lists should begin in row 2. Next run the Macro. It will ask you if you want to look at ICW or notICW. Next the Macro asks for two worksheet columns you want to use to determine ICW or not ICW. The Macro will then make the comparison for you. If you only want to compare one column, then just enter the same column twice.

With this Macro you can make complex comparisons by taking the results from one run and then comparing them to a another match list, etc. I list to create a match list containing the matches from all of my siblings in one worksheet. You can use the EXCEL duplicate function to eliminate duplicates. This gives me the most complete representation of our parents matches, who we were unable to test their DNA.

You can get fairly creative with the macro, however it is important that users realize your findings and others on false matches.; 15 December 2017 at 00:31
Debbie Kennett said...: Many thanks for the information about the Macro. I shall have to experiment when I have a bit more time in the New Year!; 15 December 2017 at 11:56
Leonard McCown said...: I wonder on these tests if more than one child of a couple are tested. I find it very interesting with 4 siblings it is interesting to see how they match the father and the mother. Some lean toward one parent or the other.; 3 April 2024 at 14:59
Debbie Kennett said...: it would be very interesting to do a comparison with four siblings and their parents though beyond the second cousin level we wouldn't expect all siblings to have the same matches. Perhaps somebody with enough siblings would be up to the challenge.; 3 April 2024 at 15:34

Pages

Sunday, 6 August 2017

Comparing parent and child matches at AncestryDNA

16 comments:

Thank you!