Saturday, 8 December 2012

23andMe's new Ancestry Composition - a British perspective

23andMe's new Ancestry Painting feature, now known as Ancestry Composition, has just been launched. The old Ancestry Painting was only able to distinguish between three continental population groupings - European, Asian and African. I was a very boring and predictable 100% European.
Ancestry Composition provides a biogeographical analysis based on 22 reference populations. 23andMe have provided an excellent guide to the science behind Ancestry Composition which is well worth reading in order to get an understanding of how the analysis works. Ancestry Composition provides a number of different views showing your comparisons with global, regional and subregional populations at three different confidence thresholds - speculative (50%), standard (75%), and conservative (90%).

My documented ancestry is all from the British Isles. I know the names and birth places of 15 of my 16 great-great-grandparents and they are all English. In this generation I have one illegitimate line which has prevented from me finding out the name of the remaining ancestor. The birthplaces of these 15 great-great-grandparents are: Burrington, Devon; Bristol (2); Thornbury, Gloucestershire; Clapham, London; Colchester, Essex; Sandon, Hertfordshire; Limehouse, London; Bermondsey, London; Merriott, Somerset; Sydenham, Kent; Sydmonton, Hampshire; Kintbury, Berkshire; Westminster, London; Sherston, Wiltshire.

I know the names of 27 of my 32 great-great-great-grandparents, but I only know the birth places of 21 of these ancestors. All of my known ancestors are from the British Isles. These are the birth places where known: Ashreigney, Devon; Mariansleigh, Devon; Thornbury, Gloucestershire; Bristol; Great Yeldham, Essex; Preston, Hertfordshire; Sandon, Hertfordshire; Scotland (place not known); Hackney, London; Laverstoke, Hampshire; County Kerry, Ireland; Merriott, Somerset; Rickmansworth, Hertfordshire; Shoreditch, London; Ecchinswell, Hampshire; Welford, Berkshire; Kintbury, Berkshire; Salford, Bedfordshire; Holborn, London; Leighterton, Gloucestershire; Purton, Wiltshire.

Ancestry Composition gives me the following percentages:

Sub-regional Resolution
Standard Estimate
17.4% British and Irish
1.6% French and German
74.2% Nonspecific Northern European
0.1% Sardinian
0.2% Nonspecific Southern European
6.5% Nonspecific European
0.1% Unassigned

Conservative Estimate 
0.3% British and Irish
71.1% Nonspecific Northern European
0.1% Nonspecific Southern European
28.0% Nonspecific European
0.5% Unassigned

Speculative Estimate
56.7% British and Irish
10.7% French and German
0.1% Scandinavian
31.2% Nonspecific Northern European
0.3% Sardinian
0.5% Nonspecific Southern European
0.4% Nonspecific European

The Sardinian and Southern European percentages are undoubtedly false positives. It is not clear if the French and German admixture appears because of the difficulties in distinguishing between British, French and German populations or if this is a reflection of more distant admixture from the Normans and Saxons.

This screenshot shows the much improved Ancestry Composition with a view of my Speculative Estimate.
These are my percentages for the Regional and Global Resolutions:

Regional Resolution
Standard Estimate
93.2 % Northern European
0.2% Southern European
6.5% Nonspecific European
0.1% Unassigned

Conservative Estimate
71.4% Northern European
0.1% Southern European
28% Nonspecific European
0.5% Unassigned

Speculative Estimate
98.8% Northern European
0.9% Southern European
0.4% Nonspecific European

Global Resolution
Conservative Estimate
99.5% European
0.5% Unassigned

Standard Estimate
99.9% European
0.1% Unassigned

Speculative Estimate
100% European

Although the subregional representations do not assign me with as much British ancestry as might be expected it is worth bearing in mind that these analyses are still in their infancy. 23andMe explain in their Ancestry Composition guide that their reference populations are largely drawn from their customer base and are supplemented from public reference datasets such as the Human Genome Diversity Project, HapMap, and the 1000 Genomes project.1 However, only a small number of genomes are as yet available in the public datasets. The 23andMe customers who are included in the reference dataset are required to have four grandparents born in the same non-colonial country. Although 23andMe were reported to have 180,000 paying customers in their database as of 9th October 2012, their customers are mostly Americans of mixed ancestry, few of whom will meet the qualifying criteria.2 Not all of the 23andMe customers will in any case have filled out the ancestry questionnaire. With the combination of 23andMe customers and public datasets there are just 7,868 people in the reference dataset used for Ancestry Composition. As all four of my grandparents were born in the UK I presume my own results have been included in this reference dataset.  I think it is a shame that 23andMe's questionnaire does not split up the United Kingdom into the four constituent countries as it would be more interesting to see if differences could be found between England, Scotland, Wales and Northern Ireland, rather than lumping all four very different countries together.

23andMe very helpfully provide details of the reference populations that they have used in their analysis. Below are screenshots showing the figures for the reference populations which appear in my Speculative Estimate.




As can be seen, the numbers are very small, but 23andMe have designed the Ancestry Composition tool in such a way that the results can be updated on a regular basis as and when more populations are added to the reference databases so no doubt the accuracy of the predictions will improve over time. For those of us from the British Isles we can probably expect to see big improvements when the datasets from the People of the British Isles Project become available. This project has tested over 4,500 people from the UK.3 To be eligible for the project people must have not just four grandparents from the same country but four grandparents from the same rural county. It might, therefore, one day be possible to assign percentages of DNA to specific English counties or regions.

"British/Irish" DNA seems to have been a particular problem with Ancestry Composition. Although the tool has a very high accuracy rate for the DNA which is assigned as British and Irish in their validation tests (a "precision" level of 0.90), they are much less successful at identifying all British/Irish DNA as British/Irish. The technical term for this is the recall rate. The recall rate for British and Irish DNA in the 23andMe validation tests was 0.32%, meaning that 68% of British and Irish DNA will not be picked up.1 The recall rate will no doubt improve as more reference samples are added to the database. However, it is difficult to quantify British or Irish DNA because we are an admixed population, comprising a mixture of DNA from many different groups such as the Saxons, Celts, Vikings, Picts, Normans, Bretons and Romans.

Chromosome view
As well as the map view there are two alternative views: split view and chromosome view. To use the split view it is necessary to have one parent in the 23andMe database. As my parents have not tested with 23andMe I cannot make use of this feature. I can, however, access the chromosome view which provides an interesting breakdown of the various percentages on the individual chromosomes. The screen shot below shows my Speculative Estimate.
With so many similar shades of blue it's quite difficult to distinguish the individual populations that make up each chromosome, though you can hover over a specific population to get a clearer picture. The screenshot below picks out the chromosomes where 23andMe speculates that I match with French and German populations.
As can be seen, whole chromosomes seem to have been matched with French and German populations which I don't quite understand. I don't have any French or German ancestry within the last several hundred years, though at the population level all British people would be expected to share many markers in common with the French and the Germans, but after several hundred years have passed I would have thought that there would only be tiny segments of "French" and "German" scattered throughout my genome.

Neanderthal DNA
In addition to Ancestry Composition another interesting and fun feature of the 23andMe test is that it will give you your percentage of Neanderthal admixture. This feature was introduced in December 2011.4 23andMe estimate that 2.5% of my DNA is inherited from Neanderthals.
Neanderthal percentages are also provided by the new Geno 2.0 test from the Genographic Project, which also provides percentages of Denisovan DNA. I imagine that 23andMe will eventually update their test to provide Denisovan percentages.

Conclusion
Ancestry Composition is a great improvement on 23andMe's Ancestry Painting. The percentages seem to be much more accurate than those provided by AncestryDNA. 23andMe also benefits by providing technical information on the methodology used by the scientists and they also provide valuable details of the reference populations used for the analysis, features which are notably absent at AncestryDNA. Family Tree DNA's Family Finder test includes a tool known as Population Finder. An update to Population Finder is expected in the New Year and it will be interesting to see how this compares with Ancestry Composition.

Other blog posts on Ancestry Composition
A number of other bloggers have written about their experiences with Ancestry Composition or provided commentary. Here is a list of the posts I have found to date. I will update the list as and when new posts are discovered:
- 23andMe's new Ancestry Painting - first look! by CeCe Moore. This post includes screenshots showing statistics on all the reference populations used by the Ancestry Composition tool.
- 23andMe Ancestry Composition Examples Part 1 by Andrea Badger. This post includes a magnificent selection of screenshots from people with a variety of mixed heritage producing a wonderful rainbow of colours.
- New worldview at 23andMe by Roberta Estes.
- My Ancestry Composition from 23andMe by Aidan Byrne.
- 23andMe Ancestry Composition by Dienekes Pontikos.
- Admixture advances by Judy Russell
- 23andMe adds ancestry composition by John Reid
- Is Daniel MacArthur 'desi' by Razib Khan

References
1. Ancestry Composition: 23andMe's State-of-the Art Geographic Ancestry Analysis. Anonymous article on the 23andMe website. Accessed 8th December 2012.
2.  How many paying customers does 23andMe have? Answer provided on Quora.com website by 23andMe software developer Alex Kohmenko on 9th October 2012.
3. The website of the People of the British Isles Project Project keeps track of the collection progress. As of 8th December 2012 it was reported that 4,538 samples had been collected.
4. Find your inner Neanderthal. 23andMe blog post by Scott H, 15th December 2011.

See also
My four part feature on "Exploring my genome with 23andMe":
Part 1 Disease risks
Part 2 Carrier status and drug responses
Part 3 Traits
Part 4 Ancestry

© 2012 Debbie Kennett

2 comments:

Anonymous said...

Interesting article, thanks!

So it seems, most of your recent ancestors are from England and most specifically, southern England, which is known to be the least impacted by the Viking/Scandanvian invasions. I read that the Scottish have more Scandanavian DNA as a result of the location of the Viking invasions. Your lack of Scandanavian DNA (0.1%) could be explained by this.

Debbie Kennett said...

Many thanks. I think you would indeed expect people from Scotland and northern England to cluster more closely with the Scandinavians. On FTDNA's MyOrigins I am 5% Scandinavian and 3% Finland and Northern Siberia.