Saturday, 3 May 2014

Driving in the wrong direction with a dodgy DNA satnav

I've been receiving a lot of questions in the last couple of days about the new DNA "satnav" tool called GPS (Geographic Population Structure) which purports to pinpoint the village that your ancestors lived in one thousand years ago. See, for example, the articles in the Daily Mail and the Washington Post. There was also some prominent and uncritical coverage on BBC Breakfast News on Thursday featuring a segment in which the BBC weather presenter Carol Kirkwood was given the results of her DNA test on air and told that her ancestors were from the town of Crieff in Scotland. As Chris Jiggins has pointed out on Twitter the acronym GPS seems to have been chosen deliberately to "promote a completely false sense of accuracy".  

The company which is offering this service is a new start up by the name of Prosapia Genetics, which has been set up by Tatiana Tatarinova from the Children's Hospital Los Angeles. The company proudly proclaim on their website: "Our first tool, GPS, will tell you where your DNA was forged, and is accurate to home village with a time resolution of the past 1,000 years."

The reports are based on an analysis of autosomal SNPs. You can either order a test through Prosapia Genetics, who appear to have an affiliate relationship with Family Tree DNA, or you can submit your raw data file from a test you've already taken with one of the companies that offers autosomal DNA testing - AncestryDNA (US only),  23andMe, Family Tree DNA, Geno 2.0 or BritainsDNA/ScotlandsDNA. A range of reports is offered with prices varying depending on the number of reference populations used for the analysis. The reports simply give you a set of geographical co-ordinates, which are supposed to represent the "ancient home" of all of your ancestors, and a map showing where your ancestors lived. We are now getting feedback from a number of people who've paid for this service and it would appear, not surprisingly, that the reality does not match the hype.

Julie Matthews bought the Basic Test, which covers 100 reference populations. She commented in the Facebook R1b-L21 group:
I spent $29 to discover that my "homeland" was in the middle of the River Humber in England. I knew we all descended from fish - here's proof. Don't waste your money!
Teresa Vega paid for the Super Test, which includes 500 reference populations. She writes in the ISOGG Facebook group:
Totally unconvincing. Stupid me paid $42.99 for nada! My ancestral home is smack dab west of Puerto Rico in the Atlantic Ocean! I learned nothing and it told me to upgrade to another test for more detailed results -- a test they don't even have listed! Don't believe the hype!!!!
Teresa's report can be seen online here.

JoAnn O'Linger had a similarly misleading result. She reports in the ISOGG Facebook group:
I had a similarly disappointing result from Prosapia (paid for), it was the "Super Test" as well: 
" JoAnn ordered a Super GPS Test of her DNA data. We found the following GPS Co-ordinates : Latitude 56.7811288256845 and Longitude 4.26921663910535 
A map pointing the location is given below with a short guide on how to interpret this results.
How to interpret your results? 
GPS coordinates indicate the place where your DNA was forged before your family may have moved to your current location. Because borders changed throughout history, your ancestors may have been part of an ancient country once ruled the region. If your GPS coordinates are in the water, it indicates mixture between two populations on the two ends of the body of water, in which case we suggest you register to the upcoming GPS2 tool that would provide you with the origins of your parents. If you wish to learn more about your past, we suggest you try the Advanced test or the Super test, which provide much higher accuracy." 
JoAnn says: "Those coordinates are squarely in the North Sea, which does make sense as I am the typical American mutt, with mostly Irish and English heritage, but if one goes further back, much of that is from Norman French and Gaelic-Norse Orcadians. So it makes sense, but in my opinion it's not worth the high price."
Prosapia Genetics have a Forum where you can read the comments from their customers, many of whom have expressed similar disappointment at the service offered:

http://prosapiagenetics.com/community/viewforum.php?f=2

[Update 10th May 2014 The Prosapia Genetics Forum is now restricted to members only. I am told that complaints and negative comments have been deleted and comments are being moderated.]

This is not surprising as the whole concept of the test is fundamentally flawed. If we assume 30 years per generation and we go back 35 generations to the year 1050 theoretically we will have 34,359,738,367 ancestors. This figure does of course exceed the population of the world at that time and in reality there will be lots of pedigree collapse which will reduce the number of ancestors considerably. Even so, the mind-boggling figures demonstrate that it is quite meaningless to try and pinpoint a single geographical location as the origin of all those diverse ancestors one thousand years ago. Furthermore, we only inherit the DNA of a tiny subset of our ancestors. To understand why this is the case read Luke Jostin's blog post "How many ancestors share our DNA" and the posts from Graham Coop and Blaine Bettinger that are linked in that article.

Even if it were possible to pinpoint a single location to represent our millions of ancestors from a thousand years ago, we would need accurate "maps" in the form of carefully sampled reference populations in order to be able to use our DNA satnav. Unfortunately, we only have a limited number of reference populations available, many of which have been sampled for medical purposes with no attempt made to collect the relevant "co-ordinates" in the form of  detailed genealogical information. Consequently, any maps included in a reference genome "satnav" are going to have massive black holes. It is therefore not surprising that this DNA satnav is misdirecting people into rivers and oceans!

The methodology behind the GPS tool was outlined in a paper by Elhaik et al entitled Geographic population structure analysis of worldwide human populations infers their biogeographical origins. The paper was published in the scientific journal Nature Communications. Despite the fact that the Prosapia Genetics website appears to have been launched on the same day that the paper was published Tatiana Tatarinova, the founder of the Prosapia Genetics website and one of the lead authors, has not declared any "competing financial interests". The paper has already been the subject of controversy. The technique described in the paper offers nothing new and it is claimed that the methodology has been copied from that used by the blogger Dienekes Pontikos, who writes under a pseudonym. For background see Dienekes' two blog posts on the subject:

- Nature Communications, the Genographic Project, Elhaik et al. re-discover zombies, the Oracle, etc. 3 years after the fact...
- The Geographic Position Structure (GPS) algorithm of Elhaik et al. (2014) is basically wrong

See in particular the comments section of the first of the above two posts where Eran Elhaik attempts to defend the charge of plagiarism.

Joe Pickrell, one of the reviewers of the paper, has posted a summary of his critique which is well worth a read. The review can be found here:

http://jkplab.org/2014/04/30/review-geographic-population-structure-gps-of-worldwide-human-populations-infers-biogeographical-origin/

The authors themselves concede in the paper that the technique has its limitations and will only work if "the appropriate samples are available in the reference population data set". They appear to have cherry-picked some conveniently isolated populations such as the Sardinians for the purposes of their study, but the technique did not work for other populations:
To test GPS’s accuracy with individuals from populations that were not included in the reference population set, we conducted two analyses. We first repeated the previous analysis using the leave-one-out procedure at the population level. As expected, GPS accuracy decreased with 50% of worldwide individuals predicted to be 450 km away from their true origin. The predicted distance increased to 1,100 and 1,750 km for 80 and 90% of the individuals, respectively (Fig. 4a). Because GPS best localizes individuals surrounded by M genetically related populations, populations from island nations (for example, Japan and United Kingdom) or populations whose most related populations were under-represented in our reference population data set (for example, Peru and Russia) were most poorly predicted. Consequently, the median distances to the true origin were much smaller for individuals residing in Europe (250 km), Africa (300 km) and Asia (450 km) due to their being more commonly represented in the reference population data set compared with Native Americans and Oceanians. These results represent the upper limit of GPS’s accuracy when the specific population of the test individual is absent from the reference population data set.
A hyped up press release was issued by the University of Sheffield which also includes a link to a video on YouTube. As is often the case, the media have picked up on the hype in the press release and have made no attempt to read the scientific paper and understand the limitations of the methodology. I hope that there have not been too many people who have paid out good money for these misleading DNA satnav reports.

Note that if you've taken a test with one of the genetic genealogy companies there are many free services that you can use to get an alternative reading of your data and a prediction of your "ethnicity", all of which will give much better results than the commercial offerings from Prosapia. One of the best free websites is GedMatch which allows you to get readings from a wide range of different services. You can find a full list of services in the ISOGG Wiki article on admixture analyses. However, it is still very difficult to distinguish between populations at anything more than the Continental level, and all such reports should be treated with a very large pinch of salt.

Update 6 May 2014
Teresa Vega now tells me that she has received a full refund for her test from PayPal. She told PayPal that she had felt misled by the company's claims and she was unhappy that they had recommended upgrading to a test that they did not even have on their site. JoAnn O'Linger is now also in the process of applying for a refund.

Acknowledgements
Many thanks to Julie Matthews, JoAnn O'Linger and Teresa Vega for permission to use their quotes and reports.

See also
Since writing this article I have discovered other discussions on the subject. I have posted the relevant links below and will update the list if further links become available:
- Prosapia Genetics - Worth the money? A review by Lorine McGinnis Schulze
- Researchers develop DNA GPS tool to accurately trace geographical ancestry -  a discussion on the Reddit forum
- Is GPS DNA tracking too good to be true? An article by Peter Calver in the Lost Cousins newsletter, May 2014
- So many genes, so close to home by Matthew Thomas, BioNews, 12 May 2014.