Showing posts with label Admixture tests. Show all posts
Showing posts with label Admixture tests. Show all posts

Thursday, 13 September 2018

Updated Ethnicity Estimates now available for everyone at AncestryDNA

I wrote back in June about my updated Ethnicity Estimate at AncestryDNA. Yesterday AncestryDNA rolled out the updates to everyone in their database. Many people will find that the changes are quite dramatic. I went from being just 21% Great Britain to 94% England and Wales, and my results are now a much better reflection of my recent ancestry within the last few hundred years. There have been a few tweaks since I got my results and the England and Wales cluster has now been renamed as England, Wales and Northwestern Europe.


The improvements have been made possible by the inclusion of many more people in the reference panel, which has now gone up from 3,000 to 16,000 samples. Previously Ancestry had just 111 samples from Great Britain, 138 from Ireland and 166 from Europe West. Now they have 1,519 samples from England, Wales and Northwestern Europe, 500 from Ireland and Scotland, 1,407 from France and 2072 from Germanic Europe. AncestryDNA are also using a different methodology and are comparing long stretches of linked markers rather than single markers in isolation. This means that the results are a reflection of our more recent ancestry within the last 500 to 1000 years rather than our distant ancestry from one thousand or more years ago.

AncestryDNA have written a White Paper explaining the methodology, which includes details of all the reference populations used. They will also be publishing a scientific paper about their methods.

Most people with British and Irish ancestry have found that their results are greatly improved and are much more in line with their known ancestry. The results will be more mixed for people from other countries. You can only be matched to the populations in the reference panel so if your country is not represented you will be matched to the next closest population. For example, AncestryDNA now has reference populations for Norway, Sweden and Finland but no distinct dataset for Denmark. Danes are therefore likely to get matched with Norway and Sweden or England, Wales and Northwestern Europe.

India, with a vast and diverse population of over 1.3 billion people, is poorly represented with just 65 samples from Western and Central India. There is also still a long way to go to get more meaningful results for people with African ancestry. There is more genetic diversity in Africa than in the rest of the world combined, which means that much larger reference panels are needed to capture this diversity. Ancestry are addressing this problem by starting an African Diversity Project, and we can look forward to further improvements in the years to come.

I always used to say that "ethnicity" estimates should be taken with a large pinch of salt and are really only of entertainment value, but we are now starting to get the stage where the results for some people can provide a reasonable approximation of their ancestry. If you've already done your family history research, the results won't tell you anything more than you already know, but at least there should now be a lot less confusion. As more populations are added to the reference panels we can expect to see similar improvements for other populations.

Update 15th September 2018
AncestryDNA will be presenting a poster at the ASHG conference in San Diego in October on Polly, the algorithm they are using for their updated ethnicity estimates. Here are the details:
PgmNr 2772/W: High-throughput local ancestry inference reveals fine-scale population history 
Authors:A. Sedghifar1; S. Song1; Y. Wang1; K. Noto1; J. Byrnes1; E.L. Hong1; K.G. Chahine1; C.A. Ball2 
Affiliations:
1) AncestryDNA, San Francisco, CA.; 2) AncestryDNA, Lehi, UT.  
An individual’s genome can be viewed as a mosaic of haplotype blocks from different ancestral origins, the sizes of which depend on the timing of admixture events. Recovering the length of these local ancestry blocks, together with their ethnic origin, provides information on the admixture and recombination events that shape current day genomes, thus shedding light on personal history as well as population history. As genomic databases rapidly approach sizes on the order of millions of genomes, there is an increased demand for super efficient approaches to identifying local ancestry blocks. Our team has developed Polly, an ultra fast algorithm for estimating genome-wide ancestry proportions in admixed individuals. Here, we present a modification of the Polly algorithm for accurately inferring local ancestry blocks. We evaluated the performance of our algorithm on simulated admixed individuals, and also assessed accuracy of estimated tract length distributions in admixed populations. Finally, we applied our method to estimate tract length distributions in historically admixed African American and Latin American populations.
The poster can be seen here.

The ASHG abstracts can be searched here.

Further reading
I've provided links below to the various official documents from AncestryDNA along with links to a few other blogs which might be of interest.

AncestryDNA links
Blogs

Tuesday, 11 July 2017

Parent and child comparisons at AncestryDNA

I've now had both my parents tested at AncestryDNA and their results have recently come in. By testing my parents I will be able to assign matches to paternal and maternal sides. My parents will potentially match people who are not on my own match list and they will have more robust matches than I do with more distant matches. I thought it would be a useful exercise to take stock of our matches, admixture reports and genetic communities to serve as a baseline for future comparisons.

DNA results and matches pages
Here is my dad's results page. He currently has 54 fourth cousins or closer, and 157 pages of matches making a total of 7850 matches. He has three shared ancestor hints, no DNA Circles and no New Ancestor Discoveries.


Here is my mum's results page. She currently has 116 fourth cousins or closer, and 212 pages of matches making a total of 10600 matches. She has no shared ancestor hints, no DNA Circles and no New Ancestor Discoveries.


Here is my results page. I currently have 66 fourth cousins or closer and 193 pages of matches (9650 matches). I have two shared ancestor hints, no DNA Circles and no New Ancestor Discoveries.


Note that the shared hints shown above do not include shaky leaf hints for the parent/child relationships. When I first checked the results hints were provided but the relationships were shown as aunt/uncle and nephew/niece rather than parent/child. I presume this was a bug as these hints have now disappeared.

As a result of testing my parents I've now gained one new shaky leaf hint. This is a predicted 5th to 8th cousin who shares a single segment of 11 centimorgans with my dad. According to the family trees they are third cousins twice removed and their common ancestors are William Cruwys (1793-1846) and Margaret Eastmond (1792-1874) who married in 1814 in Rose Ash, Devon. One of their sons emigrated to Prince Edward Island in Canada and this match is a descendant of this PEI family. Fortunately she has provided a detailed family tree, but I shall also look forward to corresponding with her and comparing notes. Interestingly this lady does not appear in my own match list so it looks as though I have not inherited this single segment from my dad.

I now also have a new filter on my match page


This filter allows me to see at a glance which matches I share with my mum and which matches I share with my dad. However, the list is restricted to those matches which are fourth cousins or closer. I can understand the restriction on shared matches for cousin relationships but it would be useful if AncestryDNA would let us sort our entire match list by paternal and maternal matches.

Comparing admixture percentages
Now let's have a look at the admixture results in more detail. AncestryDNA call this report an "Ethnicity Estimate" though strictly speaking ethnicity is self-determined and has no bearing on our genetic ancestry. AncestryDNA say that the admixture reports reflect our ancestry from "thousands of years ago". I cannot trace our family tree back thousands of years but here are the details of my dad's recent genealogical ancestry:
  • Four grandparents born in England: Bristol, Gloucestershire, London (x2).
  • Eight great-grandparents born in England: Bristol (x2), Devon, Essex, Gloucestershire, Hertfordshire, London (x2).
  • Fifteen great-great grandparents born in England: Devon (x2), Bristol, Essex, Gloucestershire, Hertfordshire (x 2), London. One great-great grandparent born in Scotland (location not known). The birthplace of seven of his English great-great-grandparents is unknown. Four were probably born in Bristol or in a nearby county. Three were Londoners who could have moved to London from anywhere in England. 
Here is my dad's Ethnicity Estimate.

Here are the details of my mum's genealogical ancestry:
  • Four grandparents born in England: London (x2), Hampshire (x2).
  • Eight great-grandparents born in England: Berkshire, Hampshire, London (x3), Somerset, Wiltshire. The birthplace of one great-grandparent is not known but he was probably born in London.
  • Fifteen great-great-grandparents born in England: Bedfordshire, Berkshire (x2), Gloucestershire, Hampshire (x2), Hertfordshire, London (x2), Somerset (x2), Wiltshire.
  • One great-great-grandparent born in Ireland: County Kerry. The birthplace of three of her English great-great-grandparents is unknown. One was probably born in Hampshire. The other two were probably Londoners who could have come from anywhere in the country.
Here is my mum's Ethnicity Estimate.
Here is my own Ethnicity Estimate.

As can be seen, there is a wide variation in the results and there is little correlation between the admixture percentages and our known genealogical ancestry. Admixture results can sometimes provide useful insights but the results should not be taken too literally. It's also worth remembering that, although the percentages have been given labels based on modern nation states, the regions which these labels cover extend well beyond the present-day national boundaries, as can be seen from my ancestry map below. The Irish component actually extends over much of the United Kingdom. The Great Britain component overlaps with Ireland and extends into northern Europe. The Europe West component extends into southern and eastern England.


Genetic communities
Genetic communities provide information about our genetic ancestry within the last few hundred years. They are also a useful way of filtering your matches so that you can focus on the matches who have family trees from the same country and the same locations as you where you stand the greatest chance of identifying a genealogical connection. I'm currently in one genetic community for the Southern English. The confidence level is 95%. I have 63 matches amongst the 204,681 AncestryDNA members in this community.

My mum and dad both have two communities: Southern English and The Welsh & English West Midlanders. In both cases the confidence level for the Southern English community is 95% and the confidence level for the Welsh community is 20%.


My dad has 45 matches in the Southern English community and nine matches amongst the 58,768 Ancestry DNA members who are in the Welsh & English West Midlanders community.

My mum has 77 matches in the Southern English community and 14 matches in the Welsh & English West Midlanders community

Neither my mum nor my dad have any known ancestry from Wales or the West Midlands. However, on looking at the map of this community, you can see that it covers a wider area and actually extends into Gloucestershire, Wiltshire, Oxfordshire and North Somerset where we do have known ancestry.


Conclusion
I now have a lot of new matches to work with, and it's going to be a great help having my parents' results available for comparison. With autosomal DNA it always helps to test as many close relatives as possible. If you can't test your parents you should try and test aunts and uncles, siblings and cousins to get the best possible representation of the DNA of all your ancestors.

Sunday, 17 May 2015

Three generations of FTDNA MyOrigins admixture results

I wrote last week about comparing my admixture results from Ancestry, 23andMe and Family Tree DNA. At Family Tree DNA I have now tested three generations of my family so I thought it would be interesting to compare the MyOrigins results across the generations. I've provided below a summary of the ancestry for each person tested together with a screenshot of their results. Click on the images to enlarge them.

Debbie's dad
Four grandparents born in England: Bristol, Gloucestershire, London (x2).

Eight great-grandparents born in England: Bristol (x2), Devon, Essex, Gloucestershire, Hertfordshire, London (x2).

Fifteen great-great grandparents born in England: Devon (x2), Bristol, Essex, Gloucestershire, Hertfordshire (x 2), London.
One great-great grandparent born in Scotland (location not known).
The birthplace of seven of his English great-great-grandparents is unknown. Four were probably born in Bristol or in a nearby county. Three were Londoners who could have moved to London from anywhere in England.


Debbie's mum
Four grandparents born in England: London (x2), Hampshire (x2).

Eight great-grandparents born in England: Berkshire, Hampshire, London (x3), Somerset, Wiltshire. The birthplace of one great-grandparent is not known but he was probably born in London.

Fifteen great-great-grandparents born in England: Bedfordshire, Berkshire (x2), Gloucestershire, Hampshire (x2), Hertfordshire, London (x2), Somerset (x2), Wiltshire.
One great-great-grandparent born in Ireland: County Kerry.
The birthplace of three of her English great-great-grandparents is unknown. One was probably born in Hampshire. The other two were probably Londoners who could have come from anywhere in the country.


Debbie
Four grandparents born in England: Bristol, London (x3).

Eight great-grandparents born in England: Bristol, Gloucestershire, Hampshire (x2), London (4),

Sixteen great-great-grandparents born in England: Berkshire, Bristol (2), Devon, Essex, Gloucestershire, Hampshire, Hertfordshire, London (x 5), Somerset and Wiltshire.
The one great-great-grandparent with an unknown birth location was probably born in London.

Twenty-four great-great-great grandparents born in England: Berkshire (x2), Bristol, Devon (x2), Essex, Gloucestershire (x2), Hampshire (x2), Hertfordshire (x3), London (x5), Somerset (x2), Wiltshire.
One great-great-great grandparent born in Ireland: County Kerry
One great-great-great grandparent born in Scotland (location not known).
The birthplace of the remaining eight English great-great-great-grandparents is unknown but they were probably born in Bristol, London and Hampshire.


Debbie's husband
Four grandparents born in England: Cambridgeshire (x2), Cumberland, Devon.

Eight great-grandparents born in England: Cambridgeshire (x3), Devon (x2), Dorset, Somerset, Surrey.

Sixteen great-great grandparents born in England: Cambridgeshire (x3), Devon (x4), Hampshire, Herefordshire, Hertfordshire, Huntingdonshire (x2), Somerset (x2), Surrey (x2). 

Twenty-six great-great-great grandparents born in England: Cambridgeshire (6), Devon (x8), Hampshire, Herefordshire (x2), Huntingdonshire, Somerset (x4), Surrey (x3), Sussex.
The birthplace of his remaining six  English great-great-great-grandparents is unknown. Three were probably born in Cambridgeshire, two in Hertfordshire and one in Surrey.


Debbie's eldest son


As can be seen, there is considerable variation between family members. This is only to be expected because of the random nature of DNA inheritance. However, some of the differences are somewhat more extreme than might be intuitively expected. For example, 57% of my DNA matches the British Isles cluster whereas only 40% of my dad's DNA matches the British Isles and only 7% of my mum's DNA.

My dad, my husband and my son all come out with smaller percentages of "Middle Eastern" DNA. My husband and son's "Middle Eastern" DNA appears on the map over Turkey, Georgia and Azerbaijan whereas my dad's supposedly Middle Eastern DNA is centred over Egypt and Jordan. I've noticed that a significant proportion of the members of my Devon DNA Project with predominantly British ancestry are coming out with these small percentages of "Middle Eastern". Clearly this does not mean that any of them have recent ancestry from the Middle East, and it is probably related to the limitations of the available reference populations. I hope to look at the Middle Eastern issue in more depth in a subsequent blog post.

As I mentioned in my previous blog post, these admixture results should really only be used for entertainment value at present. However, the results are likely to change over time as more reference populations become available and as the methodology improves. Admixture tests should really be regarded as a bonus feature of an autosomal DNA test and should not be the primary purpose for testing. While your admixture results will not tell you whether your great-great-grandfather was Scottish or Irish, you might find instead that you match with a cousin who is descended from the ancestor of interest who will be able to fill in the blanks in your family tree. Cousin-matching tests will become increasingly useful as the databases continue to grow in size.

Related blog posts

© 2015 Debbie Kennett

Saturday, 16 May 2015

Comparing admixture results from AncestryDNA, 23andMe and Family Tree DNA

I have now taken an autosomal DNA cousin-matching test at all three testing companies – 23andMe, AncestryDNA and Family Tree DNA. With this type of test you also get as bonus feature a report of your admixture percentages. I thought it would be a useful exercise to do a comparison of my admixture results from all three companies.

All my known ancestors on all my lines within the last 500 years are from the British Isles. Here is a breakdown on a generation by generation basis:

- All four of my grandparents were born in England. One grandparent was born in Bristol, and my other three grandparents were born in London.

- All eight of my great-grandparents were born in England. Four of my great-grandparents were born in London, two were born in Hampshire, one was born in Bristol and one was born in Gloucestershire.

- I know the birthplaces of 15 of my 16 great-great grandparents and they were all were born in England in the following locations: Berkshire, Bristol (2), Devon, Essex, Gloucestershire, Hampshire, Hertfordshire, London (x 5), Somerset and Wiltshire. My great-great grandparent with an unknown birth location was very likely to have been born in London.

- I know the birthplaces of 24 of my 32 great-great-great grandparents. I have one ggg grandmother who was born in Ireland, and one ggg grandfather who was born in Scotland. My other ggg grandparents were all born in England in the following locations: Bedfordshire, Berkshire (x2), Devon (x2),  Bristol,  Essex, Gloucestershire (x2), Hampshire (x2), Hertfordshire (x3), London (x5), Somerset (x2), Wiltshire. My other eight ggg grandparents are all most likely to have been born in England, probably in Bristol, London and Hampshire.

I am probably fairly typical of someone with ancestry from the south and west of England whose ancestry has been filtered through the melting pot of London.

Here is my Ethnicity Estimate from AncestryDNA. According to the AncestryDNA FAQs (Interpreting my results Q5) the test can "reach back hundreds, maybe even a thousand years, to tell you things that aren't in historical records". In the Ethnicity Estimate White Paper AncestryDNA caution that "Genetic estimates of ethnicity also go back thousands of years, beyond the end of a pedigree paper trail. Regions identified as “populations” in a pedigree may have been very different thousands of years ago, and so may be represented differently in a genetic ethnicity estimate."


Here is the MyOrigins report from Family Tree DNA. The timeframe for the genetic clusters is not given but in the MyOrigins White Paper it is stated that the clusters "span extant modern human genetic variation" but are also "reflective of ancient migrations and admixtures".


23andMe provide the most sophisticated tools. They offer three different Ancestry Composition reports - conservative, standard and speculative. They tell us that these results "reflect where your ancestors lived before the widespread migrations of the past few hundred years".

Here is my conservative estimate:


Here is my standard estimate:


Here is my speculative estimate:


23andMe also provide a chromosome view which shows the breakdown of your admixture across all your different chromosomes. Here is my chromosome view in the speculative mode:

As can be seen, these admixture percentages bear little resemblance to my documented pedigree, and when the companies try to break down Europe into individual countries they come out with quite variable results. It is perhaps only to be expected considering that there is a very limited range of reference populations available. The companies all supplement the publicly available datasets by using samples from their customer databases, but they still only have a very small number of samples from the British Isles. Here is a list of reference samples for Britain and Ireland for each of the three companies.

AncestryDNA reference samples
Great Britain  111
Ireland           138
Source: The AncestryDNA Reference Panel (version 2.0) (available to AncestryDNA customers)

Family Tree DNA MyOrigins reference samples
British      39
Irish         45
Scottish   43

23andMe reference samples
.













As can be seen, both Ancestry and Family Tree DNA have very small sample sizes from the British Isles. They have not made any attempt to split the samples into constituent countries. Northern Ireland would be expected to be genetically very similar to Scotland but we don't know if Ancestry's Irish samples are from the north or the south of Ireland or from both countries combined. We don't where in Great Britain their samples were taken from. Family Tree DNA seem to think that Scotland has already separated from Britain and is a country in its own right! In view of this, it is not clear if their British samples also include people of Scottish ancestry or if they now think that Britain only consists of England and Wales. It is also not known if their Irish samples are for people with ancestors from the whole of Ireland or just from the Republic of Ireland.

23andMe have the benefit of a larger dataset but this has not improved the accuracy of their reports, and they have a confusing grasp of geography. They label a cluster as "British and Irish" but describe samples collected from the UK and Ireland. Do they realise that Northern Ireland is part of the United Kingdom? One wonders if customers with ancestors from Northern Ireland described themselves as from Ireland or the UK. It would have made more sense to ask people to define which country within the British Isles their ancestors came from rather than providing two confusing and overlapping options.

In view of these limitations it is therefore not surprising that we often see some bizarre results. For example, it is often the case that Americans come out with much higher percentages of "British" ancestry with these tests than "native" Brits like me. Americans sometimes have surprisingly high percentages of "British" of 80% or more.

The lack of defined reference samples from specific countries within the British Isles also sometimes gives confusing results. I have one project member with seven of his eight great-grandparents born in Wales and one great-grandparent born in Devon. At AncestryDNA he comes out as 64% Irish and 12% Great Britain, 12% Scandinavia and 11% Trace Regions. At Family Tree DNA his ancestry is reported as being 97% from the British Isles and 3% from Finland and Siberia.

There are no doubt problems with the sampling in other countries too which produces similarly misleading results. Joss ar Gall, who writes the Le Gall of Lower Britanny blog, is French and all his ancestry is from Britanny yet, according to his 23andMe test, he is only 19% French. AncestryDNA assigned him with no French DNA at all but found that he was 46% British and 10% Irish. One would expect many similarities between the French and the English but clusters which clearly cross country borders should not be labelled so specifically because people are misled and take the labels too literally.

Admixture tests really need to be used for entertainment purposes only at the present time, and the results should be taken with a very large pinch of salt. However, the tests can sometimes provide useful insights. Generally it is possible to distinguish between populations at the continental level (eg Asian, African and European) provided you're from a population that is not close to a continental border. Admixture from endogamous populations such as Ashkenazi Jews and Finns can also be detected with reasonable confidence. However, it is not possible to distinguish between populations within individual European countries and it may never be possible to do so because our ancestry is so complicated.

Population level comparisons
While admixture results at the individual level are not particularly meaningful there is much more insight to be gained when the results of these tests are compared at the population level.

AncestryDNA have published a few very interesting blog posts with some nice maps comparing the admixture percentages of their British and Irish testers:

- What does our DNA tell us about being Irish by Mike Mulligan, Ancestry blog, 16 March 2015
Exploring our DNA – Europe West by Mike Mulligan, Ancestry blog, 10 April 2015
- AncestryDNA - The Viking in the room by Mike Mulligan, Ancestry blog, 23 June 2015

AncestryDNA also did a similar exercise with their American testers and produced a genetic census of America with a range of maps showing the contribution of the different admixtures to the American population.

Ancestry also provide a useful bar chart which is hidden away in their help menu showing the differences between the various clusters. The chart below shows the differences between European regions. To access the chart click on the question mark in the top right of your screen from your ethnicity estimate page to open up the help and tips menu. Then click on "Why you might have more (or less) from a certain region". There is also a chart which will give you the breakdown for all 26 clusters.


A group of 23andMe scientists published a fascinating paper earlier this year in the American Journal of Human Genetics on The genetic ancestry of African Americans, Latinos, and European Americans across the United States (Bryc, Durand, Macpherson et al 2015).

The future
While these admixture tests will probably never give us all the answers we want, they will no doubt improve over time as better reference samples become available. We are already on the second incarnation of these tests at all three companies and we can expect to see many more improvements in the years to come. I would hope that all three companies will eventually be able to have access to the dataset from the People of the British Isles Project which should give improved estimates for people of British ancestry. It would also help if the reference samples were collected more carefully and with precise countries of origin clearly defined.

This article was updated on 17th May 2015 to include the screenshot of the bar chart from AncestryDNA showing the admixture breakdown within Europe. The article was updated on 18th May to include a mention of AncestryDNA's genetic census of America. The article was updated on 1st January 2017 to include a link to an AncestryDNA blog post on the percentages of Scandinavia DNA found in British and Irish testers.

FURTHER READING
My related blog posts
23andMe
AncestryDNA Ethnicity Estimate
FTDNA MyOrigins
© 2015 Debbie Kennett

Saturday, 3 May 2014

Driving in the wrong direction with a dodgy DNA satnav

I've been receiving a lot of questions in the last couple of days about the new DNA "satnav" tool called GPS (Geographic Population Structure) which purports to pinpoint the village that your ancestors lived in one thousand years ago. See, for example, the articles in the Daily Mail and the Washington Post. There was also some prominent and uncritical coverage on BBC Breakfast News on Thursday featuring a segment in which the BBC weather presenter Carol Kirkwood was given the results of her DNA test on air and told that her ancestors were from the town of Crieff in Scotland. As Chris Jiggins has pointed out on Twitter the acronym GPS seems to have been chosen deliberately to "promote a completely false sense of accuracy".  

The company which is offering this service is a new start up by the name of Prosapia Genetics, which has been set up by Tatiana Tatarinova from the Children's Hospital Los Angeles. The company proudly proclaim on their website: "Our first tool, GPS, will tell you where your DNA was forged, and is accurate to home village with a time resolution of the past 1,000 years."

The reports are based on an analysis of autosomal SNPs. You can either order a test through Prosapia Genetics, who appear to have an affiliate relationship with Family Tree DNA, or you can submit your raw data file from a test you've already taken with one of the companies that offers autosomal DNA testing - AncestryDNA (US only),  23andMe, Family Tree DNA, Geno 2.0 or BritainsDNA/ScotlandsDNA. A range of reports is offered with prices varying depending on the number of reference populations used for the analysis. The reports simply give you a set of geographical co-ordinates, which are supposed to represent the "ancient home" of all of your ancestors, and a map showing where your ancestors lived. We are now getting feedback from a number of people who've paid for this service and it would appear, not surprisingly, that the reality does not match the hype.

Julie Matthews bought the Basic Test, which covers 100 reference populations. She commented in the Facebook R1b-L21 group:
I spent $29 to discover that my "homeland" was in the middle of the River Humber in England. I knew we all descended from fish - here's proof. Don't waste your money!
Teresa Vega paid for the Super Test, which includes 500 reference populations. She writes in the ISOGG Facebook group:
Totally unconvincing. Stupid me paid $42.99 for nada! My ancestral home is smack dab west of Puerto Rico in the Atlantic Ocean! I learned nothing and it told me to upgrade to another test for more detailed results -- a test they don't even have listed! Don't believe the hype!!!!
Teresa's report can be seen online here.

JoAnn O'Linger had a similarly misleading result. She reports in the ISOGG Facebook group:
I had a similarly disappointing result from Prosapia (paid for), it was the "Super Test" as well: 
" JoAnn ordered a Super GPS Test of her DNA data. We found the following GPS Co-ordinates : Latitude 56.7811288256845 and Longitude 4.26921663910535 
A map pointing the location is given below with a short guide on how to interpret this results.
How to interpret your results? 
GPS coordinates indicate the place where your DNA was forged before your family may have moved to your current location. Because borders changed throughout history, your ancestors may have been part of an ancient country once ruled the region. If your GPS coordinates are in the water, it indicates mixture between two populations on the two ends of the body of water, in which case we suggest you register to the upcoming GPS2 tool that would provide you with the origins of your parents. If you wish to learn more about your past, we suggest you try the Advanced test or the Super test, which provide much higher accuracy." 
JoAnn says: "Those coordinates are squarely in the North Sea, which does make sense as I am the typical American mutt, with mostly Irish and English heritage, but if one goes further back, much of that is from Norman French and Gaelic-Norse Orcadians. So it makes sense, but in my opinion it's not worth the high price."
Prosapia Genetics have a Forum where you can read the comments from their customers, many of whom have expressed similar disappointment at the service offered:

http://prosapiagenetics.com/community/viewforum.php?f=2

[Update 10th May 2014 The Prosapia Genetics Forum is now restricted to members only. I am told that complaints and negative comments have been deleted and comments are being moderated.]

This is not surprising as the whole concept of the test is fundamentally flawed. If we assume 30 years per generation and we go back 35 generations to the year 1050 theoretically we will have 34,359,738,367 ancestors. This figure does of course exceed the population of the world at that time and in reality there will be lots of pedigree collapse which will reduce the number of ancestors considerably. Even so, the mind-boggling figures demonstrate that it is quite meaningless to try and pinpoint a single geographical location as the origin of all those diverse ancestors one thousand years ago. Furthermore, we only inherit the DNA of a tiny subset of our ancestors. To understand why this is the case read Luke Jostin's blog post "How many ancestors share our DNA" and the posts from Graham Coop and Blaine Bettinger that are linked in that article.

Even if it were possible to pinpoint a single location to represent our millions of ancestors from a thousand years ago, we would need accurate "maps" in the form of carefully sampled reference populations in order to be able to use our DNA satnav. Unfortunately, we only have a limited number of reference populations available, many of which have been sampled for medical purposes with no attempt made to collect the relevant "co-ordinates" in the form of  detailed genealogical information. Consequently, any maps included in a reference genome "satnav" are going to have massive black holes. It is therefore not surprising that this DNA satnav is misdirecting people into rivers and oceans!

The methodology behind the GPS tool was outlined in a paper by Elhaik et al entitled Geographic population structure analysis of worldwide human populations infers their biogeographical origins. The paper was published in the scientific journal Nature Communications. Despite the fact that the Prosapia Genetics website appears to have been launched on the same day that the paper was published Tatiana Tatarinova, the founder of the Prosapia Genetics website and one of the lead authors, has not declared any "competing financial interests". The paper has already been the subject of controversy. The technique described in the paper offers nothing new and it is claimed that the methodology has been copied from that used by the blogger Dienekes Pontikos, who writes under a pseudonym. For background see Dienekes' two blog posts on the subject:

- Nature Communications, the Genographic Project, Elhaik et al. re-discover zombies, the Oracle, etc. 3 years after the fact...
- The Geographic Position Structure (GPS) algorithm of Elhaik et al. (2014) is basically wrong

See in particular the comments section of the first of the above two posts where Eran Elhaik attempts to defend the charge of plagiarism.

Joe Pickrell, one of the reviewers of the paper, has posted a summary of his critique which is well worth a read. The review can be found here:

http://jkplab.org/2014/04/30/review-geographic-population-structure-gps-of-worldwide-human-populations-infers-biogeographical-origin/

The authors themselves concede in the paper that the technique has its limitations and will only work if "the appropriate samples are available in the reference population data set". They appear to have cherry-picked some conveniently isolated populations such as the Sardinians for the purposes of their study, but the technique did not work for other populations:
To test GPS’s accuracy with individuals from populations that were not included in the reference population set, we conducted two analyses. We first repeated the previous analysis using the leave-one-out procedure at the population level. As expected, GPS accuracy decreased with 50% of worldwide individuals predicted to be 450 km away from their true origin. The predicted distance increased to 1,100 and 1,750 km for 80 and 90% of the individuals, respectively (Fig. 4a). Because GPS best localizes individuals surrounded by M genetically related populations, populations from island nations (for example, Japan and United Kingdom) or populations whose most related populations were under-represented in our reference population data set (for example, Peru and Russia) were most poorly predicted. Consequently, the median distances to the true origin were much smaller for individuals residing in Europe (250 km), Africa (300 km) and Asia (450 km) due to their being more commonly represented in the reference population data set compared with Native Americans and Oceanians. These results represent the upper limit of GPS’s accuracy when the specific population of the test individual is absent from the reference population data set.
A hyped up press release was issued by the University of Sheffield which also includes a link to a video on YouTube. As is often the case, the media have picked up on the hype in the press release and have made no attempt to read the scientific paper and understand the limitations of the methodology. I hope that there have not been too many people who have paid out good money for these misleading DNA satnav reports.

Note that if you've taken a test with one of the genetic genealogy companies there are many free services that you can use to get an alternative reading of your data and a prediction of your "ethnicity", all of which will give much better results than the commercial offerings from Prosapia. One of the best free websites is GedMatch which allows you to get readings from a wide range of different services. You can find a full list of services in the ISOGG Wiki article on admixture analyses. However, it is still very difficult to distinguish between populations at anything more than the Continental level, and all such reports should be treated with a very large pinch of salt.

Update 6 May 2014
Teresa Vega now tells me that she has received a full refund for her test from PayPal. She told PayPal that she had felt misled by the company's claims and she was unhappy that they had recommended upgrading to a test that they did not even have on their site. JoAnn O'Linger is now also in the process of applying for a refund.

Update 3rd September 2014
Although the Prosapia Genetics domain name was originally registered to Dr Tatiana Tatarinova, it was subsequently transferred to Vladimir Makarov.

Update 30th May 2015
In April 2015 Dr Eran Elhaik gave a presentation at Who Do You Think You Are? Live on the subject of "Reaching the Holy Grail in genetic genealogy: from genome to home village". For further details see the summary on the DNA sat nav page on the UCL website. In particular do listen to the recording of the exchange in the Q&A session between Eran Elhaik and Professor Mark Thomas.

Update 16th July 2016
A new paper by Pavel Flegontov, Alexei Kassian, Mark G. Thomas, Valentina Fedchenko, Piya Changmai and George Starostin "Pitfalls of the geographic population structure (GPS) approach applied to human genetic history: a case study of Ashkenazi Jews" provides a critique of the GPS methodology used for the Prosapia Genetics test with specific reference to its application to infer the origins of the Yiddish language.

Update 31st October 2016
A corrigendum to the Elhaik et al 2014 paper on geographic population structure has been published by Nature Communications. It contains a conflict of interests statement from the authors. The statement includes an acknowledgement that one of the authors (Tatiana Tatarinova) has a link with Prosapia Genetics.

Acknowledgements
Many thanks to Julie Matthews, JoAnn O'Linger and Teresa Vega for permission to use their quotes and reports.

Related blog posts
- My letter in Family Tree Magazine about "genetic homeland" stories

See also
Since writing this article I have discovered other discussions on the subject. I have posted the relevant links below and will update the list if further links become available:
- Prosapia Genetics - Worth the money? A review by Lorine McGinnis Schulze
- Researchers develop DNA GPS tool to accurately trace geographical ancestry -  a discussion on the Reddit forum
- Is GPS DNA tracking too good to be true? An article by Peter Calver in the Lost Cousins newsletter, May 2014
- So many genes, so close to home by Matthew Thomas, BioNews, 12 May 2014.
- Ancestral home pinpointed by DNA by Julie Lutter, Family History Research by Jodi, 13 May 2014.

© 2014-2016 Debbie Kennett

Friday, 20 December 2013

A first look at the Chromo 2 All My Ancestry test from BritainsDNA

Larry Vick has very kindly shared some screenshots with me from his BritainsDNA* Chromo 2 test. Larry previously tested with the company in the days when it was known as Ethnoancestry. As an existing customer he was given the opportunity to order the Chromo 2 test at a very favourable price. Larry ordered the combination package which includes a Y-DNA (fatherline) test, an mtDNA (motherline) test and the All My Ancestry (biogeographical analysis) test. I will cover the All My Ancestry test in this post and will discuss the other tests in a follow-up post. Click on the images to enlarge them.

The All My Ancestry test analyses around 250,000 autosomal SNPs. The screenshot below shows the All My Ancestry welcome screen.  There are three different viewing options: Global Connections, Population Percentage, and Chromosome Painting.

The Global Connections menu compares your results with reference samples from around four thousand people from around the world. The results are plotted on a colour-coded chart and you can see which population is your closest match. There is a drop-down list which gives you the option to choose a variety of alternative views: Worldwide 1; Worldwide 2; African; Sub-Saharan; West Asian; South and Central Asian; East and Northern Asian; Hispanic and Afro-Caribbean 1; Hispanic and Afro-Caribbean 2; Native American mixture; and Jewish mixture. The image below shows the Worldwide 1 view.

This image shows the European view from the Global Connections menu:

The next menu allows you to look at your Population Percentages. This plot "uses a population genetic model to estimate your overall ancestry and puts this in context using nearly 4000 people from across the world". There is again a drop-down list which allows you to see your results compared to a number of different populations. The following options are available: global; Africa; Europe; West Asia; South and Central Asia; East and North Asia; and Hispanic-Afro-Caribbean-Native American. The following screenshot shows the global comparison:

This screenshot shows the European comparison:

The final viewing option is the Chromosome Painting. This allows you to see the contribution made to each of your chromosomes from three broad population groups: West Eurasian, Sub Saharan African and Asian-Native American. The company say "An ancestor six or more generations ago will have only contributed a small segment of DNA to your genome but this method can see these small segments which are not obvious in methods which provide a summary of the whole genome." As can be seen from the painting below this method has detected what are possibly small segments of African and Native American DNA.

Larry describes his documented ancestry as follows.
I think my ancestry would be best described as colonial American with a lot of UK, significant Irish, some German, and a little African and Native American mixture. I am not sure of the amount of African and Native American. My mother's 2nd great grandmother was from an area of tri-racial people, and I have no idea as to who her African or Native American ancestors were (in fact I don't even know my mother's 2nd great grandmother's parents' names). My mother's ancestry paintings have merely supported a family story about her 2nd great grandmother being from Newman's Ridge in Hancock County, Tennessee.  This 2nd great grandmother's maiden name was COLLINS, and that is a very prominent Melungeon surname (the reputed founder of this area was Vardy COLLINS).
I have to say I've personally never been able to work up much enthusiasm for these admixture tests. I already know that all my documented ancestors are from the British Isles and none of the tests are as yet able to tell me anything more than I already know from my genealogical research. However, Larry's genetic ancestry is much more interesting than mine which makes the results a little more appealing. I particularly liked the very colourful population percentage plots, a feature which is not available from any of the other testing companies. It would have been helpful to have information on the reference populations used for the analysis, something which all the other companies now provide. Another feature which is lacking is the ability to download the raw autosomal data. Compared to the alternative offerings on the market the All My Ancestry test is rather expensive at £169 ($269). It is a little cheaper when bought as part of a package in combination with the Chromo 2 Y-DNA and mtDNA tests. For a comparison of the autosomal DNA offerings from the other testing companies see Tim Janzen's autosomal DNA testing comparison chart in the ISOGG Wiki. The BritainsDNA All My Ancestry test is not yet included on this chart but Tim will no doubt wish to update it in due course when he's had the chance to assess his own results.

* Note that BritainsDNA also trades under the names ScotlandsDNA, IrelandsDNA, YorkshiresDNA and CymruDNAWales.

Related blog posts
- A first look at the BritainsDNA Chromo 2 Y-DNA and mtDNA tests
Alistair Moffat, BritainsDNA and the BBC - a "uniquely British farce"
More pseudoscience from Alistair Moffat on the BBC
BritainsDNA, the BBC and Eddie Izzard
The British: a genetic muddle by Alistair Moffat
BritainsDNA, The Times and Prince William: the perils of publication by press release
- The saga continues - CymruDNAWales, S4C, the Tudor surname and "Who are the Welsh?"
- More on the S4C DNACymru controversy and my review of "Who are the Welsh?"

© 2013 Debbie Kennett

Tuesday, 17 September 2013

My updated ethnicity results from AncestryDNA - a British perspective

AncestryDNA announced last week that they were starting to roll out a free update to their ethnicity results. I noticed today that my updated results were now available. The beta version of AncestryDNA's ethnicity results was widely criticised. Many American customers found that they had much higher percentages of Scandinavian ancestry than expected. As one of the few British customers in the AncestryDNA database I was surprised to find that many of my American friends and genetic cousins had significantly higher percentages of "British" ancestry than me. AncestryDNA also failed to provide any background information on the reference populations used, thus rendering the results essentially meaningless. The new ethnicity results are a slight improvement but, as with all these admixture analyses, still have a long way to go before they can provide any useful information.

When you sign into your Ancestry account you are first of all presented with your old ethnicity results. If you have access to the new ethnicity results you will see a big orange label to click on. As can be seen, my original results from AncestryDNA were 58% Central European, 28% British Isles, 13% European and 4% uncertain.
According to my family history research all my documented ancestors as far back as I can trace them are from the British Isles and predominantly from England. I know the names and birth places of 15 of my 16 great-great-grandparents and they are all English. In this generation I have one illegitimate line which has prevented me from finding out the name of the remaining ancestor. The birthplaces of these 15 great-great-grandparents are: Burrington, Devon; Bristol (2); Thornbury, Gloucestershire; Clapham, London; Colchester, Essex; Sandon, Hertfordshire; Limehouse, London; Bermondsey, London; Merriott, Somerset; Sydenham, Kent; Sydmonton, Hampshire; Kintbury, Berkshire; Westminster, London; Sherston, Wiltshire.

I know the names of 27 of my 32 great-great-great-grandparents, but I only know the birth places of 21 of these ancestors. All of my known ancestors in this generation are again from the British Isles. These are the birth places where known: Ashreigney, Devon; Mariansleigh, Devon; Thornbury, Gloucestershire; Bristol; Great Yeldham, Essex; Preston, Hertfordshire; Sandon, Hertfordshire; Scotland (place not known); Hackney, London; Laverstoke, Hampshire; County Kerry, Ireland; Merriott, Somerset; Rickmansworth, Hertfordshire; Shoreditch, London; Ecchinswell, Hampshire; Welford, Berkshire; Kintbury, Berkshire; Salford, Bedfordshire; Holborn, London; Leighterton, Gloucestershire; Purton, Wiltshire.

The new Ethnicity Estimate 2.0 from AncestryDNA divides the population clusters into 26 global regions. Europe is subdivided into the following regions: Great Britain, Ireland, West Europe, Iberian Peninsula, Finnish/Northern Russia, Italy/Greece, Scandinavia, Europe East and European Jewish. My updated ethnicity percentages from AncestryDNA can be seen below. The percentages are as follows: Europe West 47%, Great Britain 21%, Ireland 20%, Iberian Peninsula 8%, Finnish/Northern Russia 2%, Italy/Greece <1%, Scandinavia <1%.
Ancestry provide somewhat contradictory information on the number of SNPs used for the ethnicity inferences. In their introductory help pages they state that they have increased the number of comparison points (markers) used to determine ethnicity from 30,000 to 300,000. Elsewhere they tell us that they are using "100,000 highly informative SNPs". Your DNA is now analysed more than 40 times to come up with the best estimate and a personalised range. The screenshot below shows the range of results for my "Great Britain" admixture which varied from a low of 0% to a high of 49% in the 40 runs through my DNA. The midpoint of 21% was picked as the best estimate. My results were then compared with "natives" from the region. A "typical native" of Great Britain supposedly has 60% admixture from Great Britain.
Ancestry explain that what they call the "Great Britain region" is "more admixed than most other regions". They provide examples from their reference populations showing the range of results found with percentages varying from 41% to 100% (see the screenshot below). My 21% from Great Britain obviously makes me a very untypical native! However, the only other British person I know who has tested with AncestryDNA has actually come out even less "British" than me with just 10% admixture from Great Britain and 12% from Ireland. In contrast the American genetic genealogy blogger Blaine Bettinger has reported that his Ancestry DNA results show that 55% of his admixture is from Great Britain and 7% is from Ireland. Another American blogger, Judy Russell, who writes the popular Legal Genealogist blog, now finds that, according to AncestryDNA, 49% of her admixture is from Great Britain. I note, however, that the reference population for the "Great Britain region" consists of a mere 195 samples, which is nowhere near adequate to represent the genetic diversity of a population of over 61 million. Ancestry also have a reference population of just 154 people to represent the people of Ireland, and just 416 samples to represent the "Europe West" region which encompasses France, Germany, Switzerland, Austria, the Low Countries, the Czech Republic and northern Italy.
Ancestry also show the percentages from other regions that were found in their Great Britain reference samples:
Ancestry have now provided more details about the reference populations used for their analysis, and have provided a detailed White Paper explaining the methodology behind the calculations. They explain that the reference panel was compiled from "a set of 4,245 DNA samples collected from people whose genealogy suggests they are native to one region". The reference panel candidates included "over 800 HGDP samples, over 1,500 samples from the proprietary AncestryDNA reference collection, and over 1,800 AncestryDNA customers who have explicitly consented to be included in the reference panel". These 4,245 samples were whittled down to provide a final reference panel of 3,000 samples. The 195 samples from Great Britain were reduced to just 111 samples in this process, and the number of samples from Ireland was cut from 154 to 138.

It is not explicitly stated but I presume that the proprietary reference collection is the Sorenson Molecular Genealogy Foundation database which Ancestry acquired in March 2012. The participants in the SMGF database provided their samples for a non-commercial research project and not for use by a large profit-making company. If the SMGF samples were re-analysed by AncestryDNA then they would be ethically obliged to get consent from the participants for the re-use of their data. It is not clear if this has actually happened.

Almost half of the samples used in the AncestryDNA reference panel were provided by AncestryDNA customers. I presume that these are customers who signed the consent form to participate in AncestryDNA's Human Genetic Diversity Project. As I have written previously, I decided not to participate in this project as I could find no published information to describe what the project entailed. I was also concerned at the somewhat deceptive way in which the consent form was muddled up with the standard terms and conditions, potentially allowing people to join the "project" without providing their informed consent. The AncestryDNA test is currently only on sale in the US. I am one of only a handful people outside the US who ordered the test in the beta-testing phase before Ancestry stopped shipping kits overseas. Therefore almost half the so-called reference samples provided for the AncestryDNA test are provided by Americans. This will inevitably introduce biases into the reference samples as the people who emigrated to America will not necessarily constitute a random sample of the population of Europe. For example, disproportionate numbers of people emigrated to America from Ireland. This bias no doubt explains why, in the few results seen so far, British people are coming out with much lower percentages from the "Great Britain region" than their American counterparts. Americans of British origin will no doubt be a good proxy for other Americans of British origin but it makes no sense to use British Americans as a reference population for "native" British people. Ancestry do also make it clear in their White Paper that they had difficulty differentiating the population of Great Britain from the rest of Western Europe. Samples from Great Britain were being "mis-assigned a significant amount of Western European ethnicity" and vice versa. My unexpectedly high Irish percentage is also presumably an artefact of the biased sampling process.

The use of an all-American reference population of AncestryDNA customers also explains the decision to lump England, Scotland and Wales together into one large "Great Britain region", and to mix the Republic of Ireland and Northern Ireland together into one "Ireland" region. It would have been much more interesting to split the British Isles up into the four constituent countries, but Ancestry clearly did not have sufficient samples with detailed genealogies from each country to do this, again because the reference samples were mostly from America rather than the British Isles. This once again calls into question Ancestry's decision to market their DNA test exclusively in the US. As most Americans are very interested in finding out more about their ancestry in Europe you would have thought it would be in Ancestry's interests to make their test available in other countries. This would have the added benefit of bringing in many more customers with four grandparents all born in the same country who could be used to provide more representative reference samples. If the AncestryDNA test is ever launched in other countries there is now going to be very little incentive for non-Americans to test as they will be overwhelmed with large numbers of distant cousins in America with little chance of ever finding the connection and no tools to filter out these large numbers of matches.

Ancestry do not provide detailed information about the timeframe which is covered by the new ethnicity estimates though they do explain that the results are provided as an "estimate of the ancient historical origins" of their customers' DNA. They add that "While this information is less relevant for genealogical research relating to the last five to ten generations, it may reveal intriguing clues about the distant history of one’s ancestors."

Even though my admixture results from the new Ethnicity Estimate 2.0 are no better than the estimates from the old beta test, Ancestry have at least responded to the criticisms and have now given details of the reference populations used and have provided us with a commendably detailed technical White Paper, though I cannot understand why such basic features were not included right from the outset.  It seems to me that AncestryDNA would have been better off investing their time and energy in providing much-needed matching segment data for their customers rather than tinkering with their "ethnicity" results. These admixture tests are still very much in their infancy and they currently have very little practical application for family history purposes. If you want to have some fun with your DNA results you can get alternative "readings" from the many people who provide a free analysis service. For further details see the ISOGG Wiki page on admixture analyses. In the meantime, if you wish to know your "ethnicity" you should carry on researching your family tree in the traditional way using the paper-based records.

© 2013 Debbie Kennett

Saturday, 8 December 2012

23andMe's new Ancestry Composition - a British perspective

23andMe's new Ancestry Painting feature, now known as Ancestry Composition, has just been launched. The old Ancestry Painting was only able to distinguish between three continental population groupings - European, Asian and African. I was a very boring and predictable 100% European.

Ancestry Composition provides a biogeographical analysis based on 22 reference populations. 23andMe have provided an excellent guide to the science behind Ancestry Composition which is well worth reading in order to get an understanding of how the analysis works. Ancestry Composition provides a number of different views showing your comparisons with global, regional and subregional populations at three different confidence thresholds - speculative (50%), standard (75%), and conservative (90%).

My documented ancestry is all from the British Isles. I know the names and birth places of 15 of my 16 great-great-grandparents and they are all English. In this generation I have one illegitimate line which has prevented from me finding out the name of the remaining ancestor. The birthplaces of these 15 great-great-grandparents are: Burrington, Devon; Bristol (2); Thornbury, Gloucestershire; Clapham, London; Colchester, Essex; Sandon, Hertfordshire; Limehouse, London; Bermondsey, London; Merriott, Somerset; Sydenham, Kent; Sydmonton, Hampshire; Kintbury, Berkshire; Westminster, London; Sherston, Wiltshire.

I know the names of 27 of my 32 great-great-great-grandparents, but I only know the birth places of 21 of these ancestors. All of my known ancestors are from the British Isles. These are the birth places where known: Ashreigney, Devon; Mariansleigh, Devon; Thornbury, Gloucestershire; Bristol; Great Yeldham, Essex; Preston, Hertfordshire; Sandon, Hertfordshire; Scotland (place not known); Hackney, London; Laverstoke, Hampshire; County Kerry, Ireland; Merriott, Somerset; Rickmansworth, Hertfordshire; Shoreditch, London; Ecchinswell, Hampshire; Welford, Berkshire; Kintbury, Berkshire; Salford, Bedfordshire; Holborn, London; Leighterton, Gloucestershire; Purton, Wiltshire.

Ancestry Composition gives me the following percentages:

Sub-regional Resolution
Standard Estimate
17.4% British and Irish
1.6% French and German
74.2% Nonspecific Northern European
0.1% Sardinian
0.2% Nonspecific Southern European
6.5% Nonspecific European
0.1% Unassigned

Conservative Estimate 
0.3% British and Irish
71.1% Nonspecific Northern European
0.1% Nonspecific Southern European
28.0% Nonspecific European
0.5% Unassigned

Speculative Estimate
56.7% British and Irish
10.7% French and German
0.1% Scandinavian
31.2% Nonspecific Northern European
0.3% Sardinian
0.5% Nonspecific Southern European
0.4% Nonspecific European

The Sardinian and Southern European percentages are undoubtedly false positives. It is not clear if the French and German admixture appears because of the difficulties in distinguishing between British, French and German populations or if this is a reflection of more distant admixture from the Normans and Saxons.

This screenshot shows the much improved Ancestry Composition with a view of my Speculative Estimate.

These are my percentages for the Regional and Global Resolutions:

Regional Resolution
Standard Estimate
93.2 % Northern European
0.2% Southern European
6.5% Nonspecific European
0.1% Unassigned

Conservative Estimate
71.4% Northern European
0.1% Southern European
28% Nonspecific European
0.5% Unassigned

Speculative Estimate
98.8% Northern European
0.9% Southern European
0.4% Nonspecific European

Global Resolution
Conservative Estimate
99.5% European
0.5% Unassigned

Standard Estimate
99.9% European
0.1% Unassigned

Speculative Estimate
100% European

Although the subregional representations do not assign me with as much British ancestry as might be expected it is worth bearing in mind that these analyses are still in their infancy. 23andMe explain in their Ancestry Composition guide that their reference populations are largely drawn from their customer base and are supplemented from public reference datasets such as the Human Genome Diversity Project, HapMap, and the 1000 Genomes project.1 However, only a small number of genomes are as yet available in the public datasets. The 23andMe customers who are included in the reference dataset are required to have four grandparents born in the same non-colonial country. Although 23andMe were reported to have 180,000 paying customers in their database as of 9th October 2012, their customers are mostly Americans of mixed ancestry, few of whom will meet the qualifying criteria.2 Not all of the 23andMe customers will in any case have filled out the ancestry questionnaire. With the combination of 23andMe customers and public datasets there are just 7,868 people in the reference dataset used for Ancestry Composition. As all four of my grandparents were born in the UK I presume my own results have been included in this reference dataset.  I think it is a shame that 23andMe's questionnaire does not split up the United Kingdom into the four constituent countries as it would be more interesting to see if differences could be found between England, Scotland, Wales and Northern Ireland, rather than lumping all four very different countries together.

23andMe very helpfully provide details of the reference populations that they have used in their analysis. Below are screenshots showing the figures for the reference populations which appear in my Speculative Estimate.





As can be seen, the numbers are very small, but 23andMe have designed the Ancestry Composition tool in such a way that the results can be updated on a regular basis as and when more populations are added to the reference databases so no doubt the accuracy of the predictions will improve over time. For those of us from the British Isles we can probably expect to see big improvements when the datasets from the People of the British Isles Project become available. This project has tested over 4,500 people from the UK.3 To be eligible for the project people must have not just four grandparents from the same country but four grandparents from the same rural county. It might, therefore, one day be possible to assign percentages of DNA to specific English counties or regions.

"British/Irish" DNA seems to have been a particular problem with Ancestry Composition. Although the tool has a very high accuracy rate for the DNA which is assigned as British and Irish in their validation tests (a "precision" level of 0.90), they are much less successful at identifying all British/Irish DNA as British/Irish. The technical term for this is the recall rate. The recall rate for British and Irish DNA in the 23andMe validation tests was 0.32%, meaning that 68% of British and Irish DNA will not be picked up.1 The recall rate will no doubt improve as more reference samples are added to the database. However, it is difficult to quantify British or Irish DNA because we are an admixed population, comprising a mixture of DNA from many different groups such as the Saxons, Celts, Vikings, Picts, Normans, Bretons and Romans.

Chromosome view
As well as the map view there are two alternative views: split view and chromosome view. To use the split view it is necessary to have one parent in the 23andMe database. As my parents have not tested with 23andMe I cannot make use of this feature. I can, however, access the chromosome view which provides an interesting breakdown of the various percentages on the individual chromosomes. The screen shot below shows my Speculative Estimate.

With so many similar shades of blue it's quite difficult to distinguish the individual populations that make up each chromosome, though you can hover over a specific population to get a clearer picture. The screenshot below picks out the chromosomes where 23andMe speculates that I match with French and German populations.

As can be seen, whole chromosomes seem to have been matched with French and German populations which I don't quite understand. I don't have any French or German ancestry within the last several hundred years, though at the population level all British people would be expected to share many markers in common with the French and the Germans, but after several hundred years have passed I would have thought that there would only be tiny segments of "French" and "German" scattered throughout my genome.

Neanderthal DNA
In addition to Ancestry Composition another interesting and fun feature of the 23andMe test is that it will give you your percentage of Neanderthal admixture. This feature was introduced in December 2011.4 23andMe estimate that 2.5% of my DNA is inherited from Neanderthals.

Neanderthal percentages are also provided by the new Geno 2.0 test from the Genographic Project, which also provides percentages of Denisovan DNA. I imagine that 23andMe will eventually update their test to provide Denisovan percentages.

Conclusion
Ancestry Composition is a great improvement on 23andMe's Ancestry Painting. The percentages seem to be much more accurate than those provided by AncestryDNA. 23andMe also benefits by providing technical information on the methodology used by the scientists and they also provide valuable details of the reference populations used for the analysis, features which are notably absent at AncestryDNA. Family Tree DNA's Family Finder test includes a tool known as Population Finder. An update to Population Finder is expected in the New Year and it will be interesting to see how this compares with Ancestry Composition.

Other blog posts on Ancestry Composition
A number of other bloggers have written about their experiences with Ancestry Composition or provided commentary. Here is a list of the posts I have found to date. I will update the list as and when new posts are discovered:
- 23andMe's new Ancestry Painting - first look! by CeCe Moore. This post includes screenshots showing statistics on all the reference populations used by the Ancestry Composition tool.
- 23andMe Ancestry Composition Examples Part 1 by Andrea Badger. This post includes a magnificent selection of screenshots from people with a variety of mixed heritage producing a wonderful rainbow of colours.
- New worldview at 23andMe by Roberta Estes.
- My Ancestry Composition from 23andMe by Aidan Byrne.
- 23andMe Ancestry Composition by Dienekes Pontikos.
- Admixture advances by Judy Russell
- 23andMe adds ancestry composition by John Reid
- Is Daniel MacArthur 'desi' by Razib Khan

References
1. Ancestry Composition: 23andMe's State-of-the Art Geographic Ancestry Analysis. Anonymous article on the 23andMe website. Accessed 8th December 2012.
2.  How many paying customers does 23andMe have? Answer provided on Quora.com website by 23andMe software developer Alex Kohmenko on 9th October 2012.
3. The website of the People of the British Isles Project Project keeps track of the collection progress. As of 8th December 2012 it was reported that 4,538 samples had been collected.
4. Find your inner Neanderthal. 23andMe blog post by Scott H, 15th December 2011.

See also
My four part feature on "Exploring my genome with 23andMe":
Part 1 Disease risks
Part 2 Carrier status and drug responses
Part 3 Traits
Part 4 Ancestry

© 2012 Debbie Kennett