Cruwys news

Friday, 2 March 2018

DNA interviews at Rootstech

Updated 5th March 2018.

Jill Ball has been out and about at Rootstech interviewing some of the speakers and representatives from the various companies and sharing them on her YouTube channel. She has a very interesting interview with Jonny Perl who developed the wonderful new DNA Painter website. Jonny was the very worthy winner of Rootstech's Innovator Showdown Contest. Remarkably he only took his first DNA test in December 2016! You can watch the interview below or direct on YouTube.

Jill Ball has also interviewed Hannah Morden of Living DNA at Rootstech. You can watch the interview below or on YouTube.

See also my blog post from yesterday on Living DNA's new Family Networks feature.

Here is an interview with CeCe Moore, the genetic genealogist on the US TV programme Finding Your Roots. The direct YouTube link can be found here.

Melanie McComb, the Shamrock Genealogist, has also done a very interesting interview with CeCe, touching on some of the ethical implications of adoption searches. The interview is available on Twitter via this link. (You shouldn't need a Twitter account to watch the interview.)

If you want to keep up with what's going on at Rootstech Randy Seaver is maintaining a useful compilation of blog posts from people who are at the event. He's also written a very helpful post on how to download the free handouts from the various talks.

Louis Kessler is doing an excellent job of keeping track of events from afar. Check out his posts:

You can watch the Rootstech livestream here.

If you missed the livestream you can watch the recordings here.

Thursday, 1 March 2018

New Family Networks feature from Living DNA

Yesterday at Rootstech, Living DNA provided a sneak preview of Family Networks, their long-anticipated relative-matching system. It is described as a "new DNA-driven matching system and family tree reconstruction method". You can find out more in the video below.

Living DNA Family Networks from Living DNA on Vimeo.

Family Networks is now in private beta-testing and will be in open beta in the third quarter of this year when it will become available to all existing and new Living DNA users. I've been sent a few screenshots which I've reproduced below.

Here is a tree view.

This is the chromosome browser.

This is what the match list will look like.

Here is the official press release I received from Living DNA.

LIVING DNA PREVIEWS UNIQUE NEW “FAMILY NETWORKS” OFFERING AT ROOTSTECH 2018

Innovative family tree and matching system will take the guesswork out of DNA relationships

Living DNA, the global consumer genetics company, has today publicly previewed its new ‘Family Networks’ platform for the first time – set to be the most precise DNA-driven matching service on the market.

Officially unveiled in Salt Lake City in Utah at RootsTech 2018, the world’s largest family-history technology conference, Living DNA’s Family Networks requires no prior user-generated family research, allowing users to build a detailed family tree based solely on their DNA, gender, and age.

Living DNA will analyse a user's unique motherline and fatherline DNA data (mtDNA and YDNA), on top of the family ancestry line (autosomal) to deliver matches – something no other company can do.

David Nicholson, managing director and co-founder at Living DNA, comments:

“With Family Networks, we will not only predict how users are related to direct matches, but we can also find and connect people to DNA matches going back up to 13 generations.

“The technology behind Family Networks automatically works out which genetic trees are possible to uncover relations. This new capability offers distinct benefits to a range of users, from avid genealogists and family history hobbyists through to adoptees and others searching for their family members. It will reduce the risk of human error and take away the tedious task of figuring out how each person in a user’s list are related to one another. We’re truly taking the guesswork out of DNA relationships.”

Living DNA’s Family Networks is scheduled to be made available to all existing and new Living DNA users by autumn 2018. The company states that the cutting-edge technology will give all customers – even those who upload from other DNA testing sites – a level of relationship prediction and accuracy that is beyond anything currently on the market.

David Nicholson adds:

“Living DNA’s precise and unique technology processes users’ DNA to identify relatives and define relationships deeper back in time. Through this rich experience, users will even be able to learn how they’re related to people with whom they share no DNA today.

“As we don’t ask for Gedcom files or other user research to build a family tree, Family Networks can be especially useful for adoptees and family searchers who are trying to locate long-lost family members but who don’t have any information on their biological family. Just by using their gender and date of birth in conjunction with their DNA, we will be able to translate their matches into a potential family tree, giving them a clearer place to start from.”

Living DNA breaks down users’ DNA into 80 worldwide regions, including 21 in the UK, more than any other testing company. The company offers a 3-in-1 test as standard: from a simple mouth swab,

Living DNA not only covers a user’s family line ancestry, but—unlike most other tests—it also includes the user’s motherline and (if male) fatherline ancestry.

Living DNA’s test itself is run on a custom-built Living DNA Orion Chip. It is one of the first bespoke DNA chips in the world to be built using the latest GSA technology from market leader Illumina, and tests over 656,000 autosomal (family) markers, 4,700 mitochondrial (maternal) markers and 22,000 Y-chromosomal (paternal) markers.

There are a few additional details in a slightly different press release which appears on the Living DNA website. The relevant text is reproduced below.

Free DNA-Driven Family Tree Reconstruction and Matching System Method Offers Greater Accuracy Than Competing Products, Takes Guesswork Out of DNA Relationships.

SALT LAKE CITY, Utah – Feb. 28, 2018 – Living DNA, the global consumer genetics company, today announced it will preview “Family Networks”—a new DNA-driven matching system and family tree reconstruction method—at RootsTech 2018, the world’s largest family-history technology conference taking place Feb. 28 – March 3 in Salt Lake City, Utah. Requiring no prior user-generated family research, Living DNA’s family reconstruction tree method is based solely on users’ DNA, gender, and age. Unlike competing organisations, Living DNA’s Family Networks will provide the most precise matching service on the market by analysing a user's unique motherline and fatherline DNA data (mtDNA and YDNA), on top of the family ancestry line (autosomal).

With Family Networks, we not only predict how users are related to direct matches, but we can also infer through DNA up to 13 generations back to connect matches with whom they share no DNA with today,” said Living DNA co-founder and Managing Director David Nicholson. “The technology behind Family Networks runs through millions of ways in which users in the network are related and automatically works out which genetic trees are possible. This new capability offers distinct benefits to a range of users, from avid genealogists to family history hobbyists, to adoptees and others searching for their family members. It will reduce the risk of human error and support the task of figuring out how each person in a user’s list are related to one another.

Family Networks will go into private beta in Q2 and open beta in Q3 2018 where it will be available to all existing and new Living DNA users. The unique computation this feature provides gives customers - even those who upload from other DNA testing sites - a level of relationship prediction and specificity beyond anything currently on the market. Where competing offerings rely solely on time-consuming and often error-prone user research, Living DNA’s amazing power tools process users’ DNA to identify relatives and define relationships deeper back in time. Through this extremely rich experience, users can even learn how they’re related to people with whom they share no DNA today.

Users need to only provide their gender and birthdate for Living DNA to build a family tree that shows where their matches fit into their family tree, with no need of Gedcom files or any other user input. This can be especially useful for adoptees and family searchers who are trying to locate long-lost family members but who don’t have any information on their biological family, Living DNA can translate their matches into a potential family tree, giving them a clearer place to start from.

I strongly believe that genetic networks are the future of genetic genealogy so I'm excited to see that Living DNA have developed this new feature. It will be interesting to see how it works out in practice.

See also

DNA – One Family, One World - a recording of the presentation given at Rootstech by David Nicholson and Hannah Morden of Living DNA
DNA interviews at Rootstech - Hannah Morden from Living DNA is interviewed by some of the Rootstech Ambassadors

Monday, 22 January 2018

Small segments and pile-ups - a visualisation

We've recently been discussing the problem of pile-ups in the All Genetic Genealogy group on Facebook. A pile-up is a term used in genetic genealogy to describe multiple shared autosomal DNA segments that are stacked up on top of each other on the same part of the genome. The presence of a pile-up should be considered as a warning sign. For any shared segment to have genealogical significance we would expect it to be shared only with descendants of the common ancestral couple. If we share a segment with hundreds or thousands of people it is extremely unlikely that we will share that section of DNA by virtue of a recent genealogical relationship within the last ten generations or so, and it is much more likely to be indicative of a false match or a more distant relationship.

Pile-ups can occur for a number of different reasons:

Lack of phasing. Phasing is the process of sorting the DNA letters (the As, Cs, Ts and Gs) onto the paternal and maternal chromosomes. AncestryDNA and MyHeritage now used phased matching which means that they phase our genotypes before trying to identify shared sections of DNA. 23andMe and Family Tree DNA use a process of half-identical matching. Our DNA is not phased but instead the algorithms zigzag backwards and forwards across two columns of unsorted DNA letters looking for consecutive runs of matching SNPs. Half-identical matching works well at identifying large shared segments of DNA but is less successful on smaller segments, and particularly segments under about 10 centiMorgans (cMs) in size. if a match does not survive phasing it is a false match.
SNP-poor regions. The autosomal DNA tests used for genetic genealogy provide information on between 630,000 and 700,000 genetic markers known as SNPs (single nucleotide polymorphisms) which are scattered across the genome. These SNPs are only a tiny fraction of the three billion letters which make up the human genome, but the SNPs are specially selected for being the most informative about variations within and between populations. When trying to identify shared regions of the genome the companies are looking for long runs of consecutive SNPs that are the same (identical by state or IBS) in two individuals. Segments which pass the companies' matching thresholds are declared to be identical by descent (IBD) and are possibly indicative of shared ancestry in a genealogical timeframe. Some companies will also apply additional algorithms to filter out known problematic regions which are unlikely to be IBD. However, because not all of our SNPs are being tested, the length of a segment can be falsely inflated. One hypothesis is that lots of small segments can become conflated into longer segments. (1) This problem is particularly likely to occur in sections of the genome which have poor coverage on the chips. (2)
Excess IBD. This is a term used to describe sections of the genome which are known to be widely shared in humans or in certain populations. Such regions often offer some type of evolutionary advantage. For an overview of known excess IBD regions see the section on excess IBD sharing in the ISOGG Wiki article on IBD. In addition to looking at the size of a shared segment, some IBD detection algorithms will, therefore, also take into account the frequency of the segment. (3) The more people who share a segment, the older it is likely to be. AncestryDNA apply their proprietary Timber algorithm to phased segments and they downweight the cM count for segments that are widely shared in their database. (4)

Each individual has their own personal pile-ups. It can be instructive to map out your pile-ups so that you are aware of your own danger zones. I've previously used Don Worth's ADSA (autosomal DNA segment analyser) tool which is available from DNAGedcom to look at my pile-ups. I've also use the matching segment search at GEDmatch (this tool is available to Tier 1 subscribers). (5) These tools are very useful for identifying problems in specific regions but it's difficult to get a good idea of the bigger picture.

Following on from our discussion in the All Genetic Genealogy Facebook group, Dan Edwards has been working on an exciting tool to provide a new way of visualising pile-ups. It's possible that the tool will eventually be made available on the web but for the moment it is a bespoke service. Dan has been experimenting on some of my data. He has produced for me some charts showing the distribution of shared segments across my 22 autosomes and on the X-chromosome. Dan has kindly given me permission to share my charts which are reproduced below.

The charts are based on my Family Finder chromosome browser data from Family Tree DNA. FTDNA updated their match thresholds in May 2016, but they are still the only company that continue to include small segments under 6 cMs when inferring a relationship. It is generally accepted by genetic genealogists that the use of such small segments is problematical. (6)

The problem with small segments can be clearly seen in the charts below. Rather than being distributed evenly across my genome, the smaller shared segments form huge spires and skyscrapers. As the segment size increases the pile-ups are greatly reduced, but there are still some parts of my genome which have some quite sizeable pile-ups on segments over 10 cMs in size. Chromosomes 9, 14, 18 and 19, in particular, seem to have a few problem areas which it is probably best for me to avoid. As more matches come in, these spires and skyscrapers can be expected to grow even more. Remember too that FTDNA only reports "matches" on small segments if the match thresholds have already been met. If matches were reported on all matches in the database down to 1 cM it's likely that the spires would be even more pronounced.

If Dan is able to develop his tool further and make it more widely available it will be interesting to see how other people's pile-ups compare with mine. I hope that we might also be able to identify a reason for some of the pile-ups. In the meantime I hope you enjoy looking at my pictures.

Footnotes

(1) See: Chiang CWK, Ralph P, Novembre J (2016). Conflation of short identity-by-descent segments bias their inferred length distribution. G3 Genes Genomes Genetics 6: 1287.

(2) For a useful overview of SNP coverage on the chips used by AncestryDNA and 23andMe see Rebekah Canada's series of articles on the subject of exploring microarray chips.

(3) For a good overview of the methodology of IBD detection see Browning and Browning (2012): Identity by descent between distant relatives: detection and applications (Annual Review of Genetics 2012; 46: 617-33). The authors state: "The key idea behind IBD segment detection is haplotype frequency. If the frequency of a shared haplotype is very small, the haplotype is unlikely to be observed twice in independently sampled individuals, so one can infer the presence of an IBD segment. This criterion can be applied in several ways. The first is length of sharing, which is a proxy for frequency. If two densely genotyped haplotypes are identical at all or most (allowing for some genotyping error) assayed alleles over a very large segment of a chromosome, then the haplotypes are likely to be identical by descent across the whole segment. The second is direct use of haplotype frequency: Shared haplotypes with estimated frequency below some threshold are determined to be identical by descent. The third makes use of a population genetics model to infer probability of IBD. Given the frequency of the shared haplotype and a probability model for the IBD process along the chromosome, one can estimate the probability that the individuals are identical by descent at any position on the segment."

(4) For a good explanation of how the AncestryDNA algorithm works read the blog post by Julie Granka on Filtering DNA matches at AncestryDNA with Timber. Take a look in particular at the figure in that blog post. Although the majority of phased segments filtered out by Timber are smaller segments under 15 cMs, note that it also downweights some larger segments up to 50 cMs in size.

(5) Peter Alefounder has developed a tool known as the Geneal Segment Stacker but I've not yet had time to play around with it. There are further details in this thread in the ISOGG Facebook group.

(6) For an excellent summary on the current state of our knowledge on the subject of small segments see the blog post A small segment round up by Blaine Bettinger.

Further reading

Chromosome pile-ups in genetic genealogy: examples from FTDNA and 23andMe. Genealogy and Genomics, 31 January 2015.

Sunday, 14 January 2018

A chromosome browser and a new matching algorithm at MyHeritage

There was a big update at MyHeritage on Thursday this week. They rolled out their updated matching algorithms and also introduced a new chromosome browser feature. MyHeritage have written an excellent blog post which explains the changes in more detail and also provides a good overview of the technicalities of DNA matching written in easy-to-understand language. You can read the article here:

Major updates and improvements to MyHeritage DNA matching

All MyHeritage customers are currently automatically opted in to DNA matching. If, for any reason, you do not want to be notified of matches you can opt out in the My Privacy DNA settings.

I previously had 49 matches at MyHeritage. The new algorithms have allowed them to drop the threshold and report more distant matches. I now have a grand total of 1474 matches. Before the changeover I found that 72% of my matches did not match either of my parents. Previously I had to go through all my matches one by one and check whether or not they matched my parents. Now, if I click on my matches with my mum and dad, I can see the tally of the matches along with a list of all the matches I share with them. I now share 530 matches with my dad and 473 with my mum. This means that 1003 of my 1474 matches (68%) match my parents. The mismatch rate has been reduced to 32% which is a huge improvement. MyHeritage announced at the end of December that they had tested 1.08 million people so the number of matches is much more in line with what we might expect from such a large database. MyHeritage advised in November that the majority of their customers were in the US but that "sales in Europe are strong".

There are some useful filters which can be used to sort your matches. Currently you can view matches that have family trees, shared surnames and Smart Matches.

I found that 1,255 of my 1474 matches (85%) have uploaded trees. However, no indication is given of the completeness of the trees, and I've noticed that some of the trees only contain a single person.

Two hundred and thirty-one of my matches have shared ancestral surnames. On a brief perusal, many of these are common surnames like Johnson and Williams, and the people I match with these surnames seem to be mostly in America and will likely have no connection with Berkshire or Devon where my ancestors with these surnames are to be found. I would suggest it's best to focus on shared matches with rarer surnames.

I like the way that MyHeritage displays country flags as this makes it much easier to identify people in the countries where you are mostly likely to find recent genetic cousins. Even better, it is possible to filter matches by country, as well as searching for matches by surname and full name. The menu can be found on your DNA Matches page.

Note that the country search box will only accept a single word so if you are searching for matches from Great Britain simply enter the word "Great". Similarly if you're trying to locate matches from New Zealand search for the word "New". I currently have 123 matches from Great Britain, 12 matches from Ireland, 62 matches from Australia, 16 matches from New Zealand, 41 matches from Canada and 867 matches from the USA. Many thanks to Louise Coakley for alerting me to this filter and for the tip about searching for matches from Great Britain and New Zealand.

My Heritage have also added a chromosome browser so that you can see a visual display of your matches. You need to scroll right down to the bottom of the match page to locate the tool. Here's the chromosome browser view of my closest match from the UK.

If I click on the Advanced Options on the top right of the chromosome browser I can download the matching segment data. In this case my match shares three segments of DNA with me which are 13.07 cMs, 6.04 cMs and 6.14 cMs respectively in size.

I recognise the names of some people who match me at other companies. I've not done a proper check but my sense is that the people who match me as 3rd to 5th cousins at MyHeritage are assigned more distant relationships at Ancestry (4th to 6th cousin or 5th to 8th cousins). Given that I'm not able to make the genealogical connections with these people I suspect the AncestryDNA estimates are more appropriate.

There's also a facility to sort matches by shared DNA, largest segment, full name and most recent. Apart from my mum and dad, I currently have no matches closer than third to fifth cousin. My highest match is somebody in America who shares 0.4% (31.9 cMs with me (0.4%) spread across four segments. However, the longest segment is only 12.8 cMs. This match only shares a total of 12.8‎ cMs (0.2%) with my dad. I can see that the remaining three segments this match shares with me that are not shared with my dad are all very small (6.49, 6.03 and 6.62 cMs respectively) so I would guess that these are false positive segments.

Partnership with FTDNA

MyHeritage use the Family Tree DNA labs in Houston, Texas, for their testing. If you've tested at MyHeritage you have the option of taking advantage of the free transfer to Family Tree DNA. The link can be found at the bottom of your DNA results page.

Further details of the transfer programme can be found here.

Similarly, if you've tested at FTDNA you can transfer your results free of charge to MyHeritage using the MyHeritage Upload link. Both companies have different databases and you will find people in both databases who have not tested elsewhere. You never know where you are going to get those all-important breakthrough matches so it's best to "fish in all the ponds".

Conclusion

MyHeritage have done an excellent job overhauling their matching algorithms. It is surprisingly difficult with current technology to identify distant matches, especially when results are being combined across different platforms. I think that MyHeritage are going about the matching in the right way and they are being very responsive to the feedback provided by genetic genealogists. I am sure we will see further improvements in the months and years to come. I look forward to receiving many more matches and to confirming my first relationship at MyHeritage DNA.

Other reviews

Kudos to MyHeritage by Lorna Henderson
MyHeritage overhauls their matching algorithm by Leah Larkin
MyHeritage improves DNA matches by John Reid

Pages