23andMe's new Ancestry Painting feature, now known as Ancestry Composition, has just been launched. The old Ancestry Painting was only able to distinguish between three continental population groupings - European, Asian and African. I was a very boring and predictable 100% European.
Ancestry Composition provides a biogeographical analysis based on 22 reference populations. 23andMe have provided an excellent guide to
the science behind Ancestry Composition which is well worth reading in order to get an understanding of how the analysis works. Ancestry Composition provides a number of different views showing your comparisons with global, regional and subregional populations at three different confidence thresholds - speculative (50%), standard (75%), and conservative (90%).
My documented ancestry is all from the British Isles. I know the names and birth places of 15 of my 16 great-great-grandparents and they are all English. In this generation I have one illegitimate line which has prevented from me finding out the name of the remaining ancestor. The birthplaces of these 15 great-great-grandparents are: Burrington, Devon; Bristol (2); Thornbury, Gloucestershire; Clapham, London; Colchester, Essex; Sandon, Hertfordshire; Limehouse, London; Bermondsey, London; Merriott, Somerset; Sydenham, Kent; Sydmonton, Hampshire; Kintbury, Berkshire; Westminster, London; Sherston, Wiltshire.
I know the names of 27 of my 32 great-great-great-grandparents, but I only know the birth places of 21 of these ancestors. All of my known ancestors are from the British Isles. These are the birth places where known: Ashreigney, Devon; Mariansleigh, Devon; Thornbury, Gloucestershire; Bristol; Great Yeldham, Essex; Preston, Hertfordshire; Sandon, Hertfordshire; Scotland (place not known); Hackney, London; Laverstoke, Hampshire; County Kerry, Ireland; Merriott, Somerset; Rickmansworth, Hertfordshire; Shoreditch, London; Ecchinswell, Hampshire; Welford, Berkshire; Kintbury, Berkshire; Salford, Bedfordshire; Holborn, London; Leighterton, Gloucestershire; Purton, Wiltshire.
Ancestry Composition gives me the following percentages:
Sub-regional Resolution
Standard Estimate
17.4% British and Irish
1.6% French and German
74.2% Nonspecific Northern European
0.1% Sardinian
0.2% Nonspecific Southern European
6.5% Nonspecific European
0.1% Unassigned
Conservative Estimate
0.3% British and Irish
71.1% Nonspecific Northern European
0.1% Nonspecific Southern European
28.0% Nonspecific European
0.5% Unassigned
Speculative Estimate
56.7% British and Irish
10.7% French and German
0.1% Scandinavian
31.2% Nonspecific Northern European
0.3% Sardinian
0.5% Nonspecific Southern European
0.4% Nonspecific European
The Sardinian and Southern European percentages are undoubtedly false positives. It is not clear if the French and German admixture appears because of the difficulties in distinguishing between British, French and German populations or if this is a reflection of more distant admixture from the Normans and Saxons.
This screenshot shows the much improved Ancestry Composition with a view of my Speculative Estimate.
These are my percentages for the Regional and Global Resolutions:
Regional Resolution
Standard Estimate
93.2 % Northern European
0.2% Southern European
6.5% Nonspecific European
0.1% Unassigned
Conservative Estimate
71.4% Northern European
0.1% Southern European
28% Nonspecific European
0.5% Unassigned
Speculative Estimate
98.8% Northern European
0.9% Southern European
0.4% Nonspecific European
Global Resolution
Conservative Estimate
99.5% European
0.5% Unassigned
Standard Estimate
99.9% European
0.1% Unassigned
Speculative Estimate
100% European
Although the subregional representations do not assign me with as much British ancestry as might be expected it is worth bearing in mind that these analyses are still in their infancy. 23andMe explain in their Ancestry Composition guide that their reference populations are largely drawn from their customer base and are supplemented from public reference datasets such as the Human Genome Diversity Project, HapMap, and the 1000 Genomes project.1 However, only a small number of genomes are as yet available in the public datasets. The 23andMe customers who are included in the reference dataset are required to have four grandparents born in the same non-colonial country. Although 23andMe were reported to have 180,000 paying customers in their database as of 9th October 2012, their customers are mostly Americans of mixed ancestry, few of whom will meet the qualifying criteria.2 Not all of the 23andMe customers will in any case have filled out the ancestry questionnaire. With the combination of 23andMe customers and public datasets there are just 7,868 people in the reference dataset used for Ancestry Composition. As all four of my grandparents were born in the UK I presume my own results have been included in this reference dataset. I think it is a shame that 23andMe's questionnaire does not split up the United Kingdom into the four constituent countries as it would be more interesting to see if differences could be found between England, Scotland, Wales and Northern Ireland, rather than lumping all four very different countries together.
23andMe very helpfully provide details of the reference populations that they have used in their analysis. Below are screenshots showing the figures for the reference populations which appear in my Speculative Estimate.
As can be seen, the numbers are very small, but 23andMe have designed the Ancestry Composition tool in such a way that the results can be updated on a regular basis as and when more populations are added to the reference databases so no doubt the accuracy of the predictions will improve over time. For those of us from the British Isles we can probably expect to see big improvements when the datasets from the
People of the British Isles Project become available. This project has tested over 4,500 people from the UK.
3 To be eligible for the project people must have not just four grandparents from the same country but four grandparents from the same rural county. It might, therefore, one day be possible to assign percentages of DNA to specific English counties or regions.
"British/Irish" DNA seems to have been a particular problem with Ancestry Composition. Although the tool has a very high accuracy rate for the DNA which is assigned as British and Irish in their validation tests (a "precision" level of 0.90), they are much less successful at identifying all British/Irish DNA as British/Irish. The technical term for this is the recall rate. The recall rate for British and Irish DNA in the 23andMe validation tests was 0.32%, meaning that 68% of British and Irish DNA will not be picked up.
1 The recall rate will no doubt improve as more reference samples are added to the database. However, it is difficult to quantify British or Irish DNA because we are an admixed population, comprising a mixture of DNA from many different groups such as the Saxons, Celts, Vikings, Picts, Normans, Bretons and Romans.
Chromosome view
As well as the map view there are two alternative views: split view and chromosome view. To use the split view it is necessary to have one parent in the 23andMe database. As my parents have not tested with 23andMe I cannot make use of this feature. I can, however, access the chromosome view which provides an interesting breakdown of the various percentages on the individual chromosomes. The screen shot below shows my Speculative Estimate.
With so many similar shades of blue it's quite difficult to distinguish the individual populations that make up each chromosome, though you can hover over a specific population to get a clearer picture. The screenshot below picks out the chromosomes where 23andMe speculates that I match with French and German populations.
As can be seen, whole chromosomes seem to have been matched with French and German populations which I don't quite understand. I don't have any French or German ancestry within the last several hundred years, though at the population level all British people would be expected to share many markers in common with the French and the Germans, but after several hundred years have passed I would have thought that there would only be tiny segments of "French" and "German" scattered throughout my genome.