Tuesday, 20 January 2015

What is the current size of the consumer genomics market?

The subject of how many people have taken a DNA test is always the source of much speculation, and reliable figures are hard to come by. However, in a report published this week by GenomeWeb Spencer Wells, director of National Geographic's Genographic Project anticipates that "the 3 millionth person" [will] test him or herself during the next few months". In the same article Roberta Estes, who writes the popular DNAeXplained blog, suggests that the three million milestone might already have been achieved. She notes: "23andMe has stated publicly that it has genotyped 800,000 kits, AncestryDNA and the Genographic Project each has genotyped perhaps more than 700,000, and Family Tree DNA has genotyped close to 120,000 people for its Family Finder autosomal DNA offering alone." I thought I would take a look at the available sources for the different companies to see if it might be possible to verify these figures and provide an estimate of the current total.

23andMe state in their media fact sheet that they have genotyped more than 800,000 customers.

The 23andMe test is sold in 56 countries of the world. However, I estimate that about 90% of their customer base is in the US. Canada and the UK are currently the only countries where the 23andMe test includes the health and trait reports.

The Genographic Project
The Genographic Project's home page states, as of today's date, that the project has 705,343 participants.

I understood that the Genographic Project kit could be purchased from any country in the world, but from the dropdown menu in their online shop it would appear that the kit is now sold in just 33 countries.

AncestryDNA confirmed in August 2014 that they had tested over 500,000 DNA customers. In a presentation given towards the end of last year by Ken Chahine, Ancestry's senior Vice President and General Manager, he stated that AncestryDNA were selling 30,000 to 50,000 DNA kits per month. If we take the middle figure of 40,000 multiplied by six that gives us a figure of 240,000 kits sold since August 2014, bringing the total up to 740,000.

The AncestryDNA test is currently only sold in America, but there are plans to launch the test in the UK, Ireland, Australia and perhaps other countries later this year.

Family Tree DNA
Family Tree DNA provide details only on the number of different types of tests taken and not the total number of customers. According to their website, as of today's date, their stats are as follows:

- 520,257 Y-chromosome DNA records in the database. The Y-DNA database includes 180,005 people who have tested at least 37 Y-STR markers. The FTDNA database also includes several thousand people who have taken the advanced BIG Y test, a comprehensive Y-chromosome sequencing SNP discovery test. FTDNA almost certainly have the largest Y-chromosome DNA database in the world with samples tested at higher resolution than in any other database.

- 190,105 mitochondrial DNA records in the database. The mtDNA database includes 47,849 people who have taken the full mitochondrial sequence (FMS) test. (This test was previously known as the FGS - full genomic sequence test). FTDNA probably have the world's largest database of full mtDNA genomes.

- The number of autosomal Family Finder tests in the FTDNA database has not been publicly disclosed. It is not clear if the 120,000 figure cited by Roberta Estes in the GenomeWeb article mentioned above is an estimate or an actual figure obtained from FTDNA staff, but the number certainly seems to be in line with my own estimates.

FTDNA sell their tests in theory to any of the 200 or so countries of the world. However, they are unable to ship to Iran and Sudan because of customs restrictions.

FTDNA have partnerships with the European company iGENEA and the Middle Eastern company DNA Ancestry & Family Origin. These partnerships have helped to bring in many non-English-speaking customers from Europe and the Middle East, but again many more who will have tested direct with FTDNA.

iGENEA kit numbers are preceded by the letter E. The iGENEA kit numbers in my mtDNA Haplogroup U4 Project go up to kit no. E17977 so it would appear that nearly 20,000 Europeans have tested through iGENEA. Many Europeans will also have tested directly through FTDNA. (It is in fact considerably cheaper to order direct through FTDNA rather than through iGENEA, but iGENEA do have the advantage of a website which is available in French, German, Spanish and Italian.)

The kits from the Middle East are preceded by the letter M. The highest kit with the M prefix that I can find in the large Arab Tribes DNA Project is kit no. 9658 so there are perhaps around 10,000 people who have tested through the FTDNA affiliate in the Middle East.

Family Tree DNA also have partnerships with a number of smaller companies such as DNA Worldwide and Jewish Voice, though these partnerships probably only account for a few thousand kits. For details on the various prefixes see the ISOGG Wiki article on Family Tree DNA kit numbers.

The international diversity of the FTDNA database can be seen in the huge range of geographical DNA projects, which are run by volunteer project administrators from around the world.

Family Tree DNA are the testing partner for the Genographic Project, and all the Geno 2.0 tests are processed in FTDNA's lab in Houston, Texas. Genographic Project participants have the option of transferring their results into the FTDNA database. Genographic Project kit numbers are preceded by the letter N. The highest Genographic Project kit number in the Haplogroup U4 Project is kit number N129937. We therefore know that around 130,000 Genographic Project customers have transferred their results to FTDNA.

Family Tree DNA are the only company who will accept autosomal transfers from other testing companies. They can accept transfers for people who have tested at both AncestryDNA and 23andMe. However, 23andMe transfers can only be accepted if the test was done on the version 3 chip which was sold between November 2011 and November 2013. Kit numbers for the autosomal transfers are prefixed by the letter B. The same prefix is also used for Y-DNA transfers from AncestryDNA and DNA Heritage. AncestryDNA no longer offer Y-STR testing. FTDNA purchased the British company DNA Heritage in April 2011. The highest B kit I can find in my projects is B39616 in the Haplogroup U4 Project, so it would appear that there are getting on for 40,000 third-party transfers in the FTDNA database. Both DNA Heritage and AncestryDNA only ever had quite small Y-DNA databases, and in any case not everyone transferred their Y-DNA results, so I would guess that the majority of the third-party transfers (perhaps in the region of 35,000) are autosomal results from 23andMe and AncestryDNA. It is not clear if the third-party transfers are included in the estimate of the size of the FTDNA Family Finder database or if these transfers are in addition to the autosomal tests processed directly by FTDNA.

It is impossible from these figures to determine precisely how many individuals there are in the Family Tree DNA database because many people who have ordered a Y-DNA test will also have gone on to order a Family Finder test and/or a mitochondrial DNA test and vice versa. The kit numbers probably provide the closest approximation of the number of people in the database. My highest FTDNA kit number is kit number 394825 in the Devon DNA Project. It may well be that the 400,000 milestone has already been passed. If we assume that there are 400,000 FTDNA kits, 130,000 Genographic transfers, 20,000 iGENEA kits, 10,000 kits from FTDNA's Middle Eastern partner, and 5,000 miscellaneous kits, we get a figure of 565,000 which is probably a reasonable estimate of the number of individuals in the FTDNA database.

Other companies
In addition to the big four companies there are a number of other smaller companies such as BritainsDNA, Oxford Ancestors and GeneBase which sell genetic ancestry tests direct to the consumer. A full list of DNA testing companies can be found in the ISOGG Wiki. However, none of these smaller companies disclose the size of their databases, and many of the people who've tested with the smaller companies have retested with one of the big four companies. I hesitate to estimate the number of people tested with these different companies but I do not think the figure can be more than 50,000 and is very likely to be much less than this.

What is the total?
To sum up, the total number of individuals tested at each of the four big companies is as follows;

Genographic Project  705,343
23andMe                    800,000+
Family Tree DNA      565,000 (DK estimate)
AncestryDNA            740,000 (DK estimate)

If we add all these figures together we get a total of 2,810,343. However, this figures makes no allowance for the significant overlap in the four databases as there are many people who have tested at multiple companies. For example, I've had my own DNA tested at 23andMe, Family Tree DNA and AncestryDNA. We can subtract the 130,000 people who have transferred their Genographic results to FTDNA and we can perhaps estimate that about 35,000 people have transferred autosomal DNA results to FTDNA.  That brings the total down to 2,645,343. There is probably more overlap than I've allowed for, but it does seem very likely that there are currently around two and a half million people in the world who have paid for a DNA test with the big four companies. It will be interesting to see what these figures look like this time next year.

© 2015 Debbie Kennett

Thursday, 8 January 2015

The Ancestry Y-DNA and mtDNA samples have not been destroyed after all

Ancestry announced back in June 2014 that they would be retiring their Y-DNA and mtDNA tests. Ken Chahine, Ancestry's Senior Vice-President, wrote at the time that the company had taken the decision to destroy the Y-DNA and mtDNA samples:
"Second, as part of the decision to retire Y-DNA and mtDNA tests we were faced with another difficult decision of what to do with the customer samples. On the one hand, we understand the value of these samples to many of you. On the other hand, we take customer privacy seriously and, regrettably, the legal framework used to collect these samples does not allow us to retest or transfer those samples. Practically speaking, many of these samples are also no longer useable. For example, many of the swabs were exhausted of genetic material during our testing or the sample may be past its shelf life. In the end we made the difficult decision to destroy the samples and are committed to trying to find solutions to these roadblocks for future products." (Source: http://blogs.ancestry.com/ancestry/2014/06/12/comments-on-y-dna-and-mtdna-tests)
There was widespread concern in the genetic genealogy community at the potential loss of this valuable resource. The decision was particularly hard on those people with deceased relatives in the AncestryDNA database. Many people wrote to Ancestry to ask for the samples to be retained and a petition was started asking them to reconsider. A number of leading genetic genealogists and bloggers also pleaded directly with Ancestry to ask them to change their minds. Nevertheless, it was widely assumed that once Ancestry's Y-DNA and mtDNA database was taken down at the end of September 2014, the DNA samples would be destroyed at the same time. However, I'd heard unofficially that the samples hadn't been destroyed after all so I asked Mike Mulligan, International Product Manager of Ancestry.com, for clarification on the issue. This is the official response that he sent me from the AncestryDNA team:
AncestryDNA stores DNA samples in a secure facility designed specifically for the preservation of DNA. Though we no longer offer Mitochondrial and Y-DNA specific DNA tests, Ancestry continues to store the DNA samples collected from the past. We are currently in discussion as to the future of the stored Y-DNA and Mitochondrial samples and take this responsibility seriously. Ancestry understands the value of the tests to family history research and for this reason, members will continue to have access to their digital results by downloading the file from the AncestryDNA results page. This feature will be available for the foreseeable future. 
When a decision is made, Ancestry will work to inform customers affected by these changes. In the meantime, be assured that Ancestry is working toward the best outcome for the Mitochondrial and Y-DNA samples.
It's reassuring to know that Ancestry have listened to their customers, and I hope that a satisfactory solution can be found for everyone concerned.

Tuesday, 9 December 2014

Richard III and the use of DNA as evidence

The long-awaited scientific paper with details of the Richard III DNA analysis has finally been published. Twenty-two months have passed since the memorable press conference at the University of Leicester in February 2013 when Richard Buckley, the lead archaeologist on the Richard III dig, declared that "It is the academic conclusion of the University of Leicester that beyond reasonable doubt the individual exhumed at Greyfriars in September 2012 is indeed Richard the III, the last Plantagenet king of England." However, at that time most of the DNA work had yet to be done and none of the findings had been written up and published in peer-reviewed journals, though the evidence already seemed to be overwhelming. Since then five peer-reviewed papers have been published on different aspects of the study (see the list below). The DNA paper by Turi King et al “Identification of the remains of King Richard III” has been published in the journal Nature Communications and can be found here:


It is accompanied by 56 pages of supplementary material with all the technical details about the DNA testing (including Richard III's Y-DNA and mtDNA haplogroup assignments) and extensive genealogical information. This is perhaps the first time that a scientific paper has included such detailed genealogical content. It is well worth reading the paper in its entirety together with all the accompanying material. It is a masterclass in how to do ancient DNA research and how to correlate DNA with genealogical evidence. In addition, the University of Leicester have issued an official press release and this includes further information about the study as well as links to a number of interesting videos showing how the genetic and genealogical research was done.

The DNA results have been extensively covered in the media, with most reports, such as this one from the BBC, focusing on the lack of  a Y-DNA match and the possible implications for the monarchy, though many people have also commented that it might perhaps have been more of a surprise if the Y-DNA had matched!

In what is believed to be the first analysis of its kind, the authors brought together the genetic and genealogical evidence, along with previously reported non-genetic evidence, and used a probabilistic assessment to determine whether or not the remains found in the Leicester car park were actually those of Richard III. Such analyses are often presented in courtrooms but have never previously been used to answer genealogical and historical questions. I wonder if this might be the start of a new trend!

The statistical analysis was done by my colleagues at University College London, Professor Mark Thomas and Professor David Balding, working alongside Turi King. Two competing hypotheses were investigated:

                                       Hypothesis 1 (H1) Skeleton 1 is Richard III
                                       Hypothesis 2 (H2) Skeleton 1 is not Richard III

The analysis took into account the genetic evidence from the Y-DNA and mtDNA testing and the previously reported non-genetic evidence (radiocarbon data, estimated age at death, sex, presence of scoliosis, and presence of wounds suffered around the time of death). People always tend to over-estimate the importance of DNA test results which is why in genetic genealogy we always emphasise the need to use DNA in combination with genealogical evidence. DNA can effectively prove that two people are not related on a specific line, and it can generally be used to confirm relationships with very close blood relatives. However, for more distant relationships DNA can only indicate that two cousins share a common genetic ancestor. DNA cannot give us the precise date when that ancestor might have lived and there is always a very wide range of possible dates. Consequently DNA evidence can broadly support a hypothesis but is generally not conclusive in its own right. This is particularly the case with mitochondrial DNA testing. Although it is now possible to sequence the whole mitochondrial genome, mtDNA has a low mutation rate and two people can have an identical mtDNA sequence yet sometimes share a common ancestor who lived several thousand years ago.

The same limitations apply to the case of Richard III. Contrary to popular belief, the mtDNA match on its own did not "confirm" Richard III's identity; it was merely one of a number of pieces of supporting evidence which had to be considered in combination with the conflicting evidence from the lack of a Y-DNA match. Here's an extract from the UCL press release:
Contrary to what many may have expected, the genetic evidence alone is not conclusive, partly because only the mtDNA and Y chromosome are suitable for comparing distantly related individuals. In fact, the Y chromosome, did not match presumed male-line relatives of the king, and so counted against Hypothesis 1. However, this non-match could be explained by one or more false-paternity events (where the biological father is not the father recorded in family history) over 19 generations; such events are not uncommon so the male-line data only weakly favoured Hypothesis 2. The mtDNA evidence was found to support Hypothesis 1, but overall the genetic evidence was not enough to confidently identify Skeleton 1 as Richard III. 
However, when combined with the non-genetic evidence, even after making assumptions intended to count against Hypothesis 1, the authors obtained an overall likelihood ratio of 6.7 million. Even a sceptical translation of this likelihood ratio corresponds to a 99.9994% probability that ‘Skeleton 1’ is the remains of King Richard III, which the scientists believe puts the matter beyond reasonable doubt.
It would have been interesting to see how the probabilities would have worked out if there had also been a Y-DNA match. There is still scope for further DNA testing on descendants of other Y-lines from the higher branches of the tree if suitable candidates can be identified. Now that the Y-DNA results are in the public domain it's also possible that someone might take a DNA test and discover that he matches the Richard III Y-DNA signature which could perhaps encourage genealogical research to determine whether or not there is a connection.

Richard III's DNA
Richard III belongs to Y-DNA haplogroup G2 (G-P287). His 23-marker Y-STR profile has already been uploaded to the public ySsearch database and is available via two different ySearch IDs:

Richard III ySearch ID 45AER

Richard III ySearch ID B8YDF

I presume that the authors followed the NIST standards for reporting STR markers. There are different ways of counting the markers and different companies have reported results in different ways. Family Tree DNA have not yet converted their database to the NIST standards. When comparing the Richard III signature on Ysearch with FTDNA results an adjustment would need to be made to the STR marker known as GATA H4.1. See the Marker Standards page on the Sorenson Molecular Genealogy Foundation website for a conversion table:


Richard III's mtDNA places him in haplogroup J1c2c3, a new branch of J1c that is defined by the mutation A12397G. This new subclade was added to the mtDNA tree with the latest build of Phylotree thanks to the work of Ian Logan (see his posting on the Genealogy DNA mailing list). Richard III's mtDNA profile (control region only) has been uploaded to Mitosearch:

- Richard III's  Mitosearch ID T227G

All the mitochondrial sequences generated from the Richard III study have been deposited in GenBank under the accession codes KM676292 to KM676294. I cannot find the sequences on GenBank and presume they must have only been submitted very recently so it will take time for them to appear. In the meantime Ian Logan has provided a list of all publicly available J1c2c sequences that have been uploaded to GenBank and www.openSNP.org.

Other peer-reviewed papers on Richard III
The DNA paper is the sixth in a series of papers resulting from the Richard III study. The other papers are:

2)  Mitchell PD et al. The intestinal parasites of King Richard IIIThe Lancet 2013; 382 (989): 888.

4)  Lamb AL et al Multi-isotope analysis demonstrates significant lifestyle changes in King Richard IIIJournal of Archaeological Science 2014: 50: 559-565.

5) Appleby J et al. Perimortem trauma in King Richard III: a skeletal analysisThe Lancet, Early Online Publication, 17 September 2014.

© 2014 Debbie Kennett

Tuesday, 2 December 2014

23andMe relaunches health reports in the UK

It has been announced today that 23andMe have reintroduced their health test in the UK. The 23andMe test has been available in the UK since the company launched back in 2006, but in November 2013 23andMe were asked by the Food and Drugs Administration in America to withdraw their health reports pending regulatory approval. Existing customers were able to retain access to their health reports but new customers who ordered a kit on or after 22nd November 2013 were only able to receive the ancestry reports. The health reports were restored in Canada in October this year. The UK is now the second country to have renewed access to the 23andMe health reports. UK customers who ordered a 23andMe ancestry test between 22nd November 2013 and 1st December 2014 are now able to receive the new health reports free of charge. New customers in the UK who order a 23andMe test from today onwards will now have access to both health and ancestry reports. The 23andMe UK website can be found at: http://www.23andme.co.uk This URL redirects to: https://www.23andme.com/en-gb/

There has been a slight increase in price. The new test now costs £125 but this price is inclusive of shipping. The old test cost cost $178.95 ($99 for the test + $79.95 for shipping) which worked out at around £114 per test at the current exchange rate.

The new test is a pared down version of the previous test as can been in the comparison below.

New UK 23andMe health test
Old 23andMe health test
43 inherited conditions 
53 inherited conditions
12 drug responses
24 drug responses
11 genetic risk factors
122 health risks
38 traits                    
60 traits

A full list of the reports offered can be seen  here: https://www.23andme.com/en-gb/health/reports/#traits

I have access to a UK account which has the new health reports and I've had a chance to have a look around and see what is offered. Previously 23andMe used a star system to grade the confidence levels that they had assigned to reports. In my own 23andMe account I have reports that are graded from one star up to four stars. The grading system is explained as follows:

Four stars: Established Research. At least two studies examined more than 750 people with the trait or condition and/or the associations are widely accepted in the scientific community. The reports may cover rare conditions or include variants that do not greatly influence a person's absolute lifetime risk for a condition.

Three stars: Preliminary Research. More than 750 people with the condition were studied, but the findings still need to be confirmed by the scientific community in an independent study of similar size.

Two stars Preliminary Research. Fewer than 750 people were studied. Multiple large studies are needed to confirm these findings.

One star: Preliminary Research. Fewer than 100 people were studied. Multiple large studies are needed to confirm these findings.

With the new test only four-star reports are shown for genetic risk factors, drug responses and inherited conditions. The trait reports have star ratings of two, three or four stars.

Some four-star health reports are no longer available (for example, diabetes, age-related macular degeneration, bipolar disorder and stomach cancer). It is not clear why these reports are now excluded whereas potentially more controversial reports such as Alzheimer's are still available.

The display of the health reports has changed. 23andMe no longer show your risk compared to the average. This was previously presented in a somewhat alarming and confusing way so that all the conditions for which you had a higher than average risk factor, however small, were highlighted in red as though they were potentially a cause for concern. For example, in my own report I supposedly have a 0.2 % risk of bipolar disorder compared to an average risk of 0.1%. In contrast I have a 50.6% risk of obesity compared to an average risk of 59%, but because my risk was lower than average, it was not picked out in the report as being of special concern even though I am much more at risk of obesity than I am of bipolar disorder.

I had hoped that a test aimed at the UK market would be customised with links to UK resources. However, as far as I can gather most of the resources are in fact American resources. Confusingly when the word "national" appears it is used to refer to the US and not the UK. Similarly 23andMe advise talking to a genetic counsellor if you have concerns about your results, but they provide a link to an American company called InformedDNA and a link to the US National Society of Genetic Counselors. It would have been much more helpful to provide information about genetic counsellors in the UK and links to NHS resources.

Regardless of these minor quibbles, it's good news that the 23andMe health reports are once again available in the UK.

I have provided links to further coverage of the story below:

- 23andMe press release: http://mediacenter.23andme.com/en-gb/blog/2014/12/01/23andme-brings-ce-marked-personal-genome-service-to-the-uk/

- BBC interview with Anne Wojcicki: http://www.bbc.co.uk/news/science-environment-30288939

- BBC: http://www.bbc.co.uk/news/science-environment-30285581

- The Guardian: http://www.theguardian.com/technology/2014/dec/02/google-genetic-testing-23andme-uk-launch

- Daily Mail: http://www.dailymail.co.uk/health/article-2856789/125-DNA-test-checks-100-conditions-assess-risk-Alzheimer-s-cancer-going-bald.html

- GenomeWeb: https://www.genomeweb.com/microarrays-multiplexing/23andme-gets-ce-mark-launches-pgs-offering-uk-125

- The Verge (US): http://www.theverge.com/science/2014/12/1/7316089/23andme-expands-to-the-uk-despite-us-restrictions

- PHG Foundation: http://www.phgfoundation.org/news/16442/

Further reading
My series of articles on my 23andMe test - Note that my test was done on the Version 2 chip before the launch of the new test on the v4 chip in the UK
- Tim Janzen's autosomal DNA testing comparison chart