Tuesday 20 January 2015

What is the current size of the consumer genomics market?

The subject of how many people have taken a DNA test is always the source of much speculation, and reliable figures are hard to come by. However, in a report published this week by GenomeWeb Spencer Wells, director of National Geographic's Genographic Project anticipates that "the 3 millionth person" [will] test him or herself during the next few months". In the same article Roberta Estes, who writes the popular DNAeXplained blog, suggests that the three million milestone might already have been achieved. She notes: "23andMe has stated publicly that it has genotyped 800,000 kits, AncestryDNA and the Genographic Project each has genotyped perhaps more than 700,000, and Family Tree DNA has genotyped close to 120,000 people for its Family Finder autosomal DNA offering alone." I thought I would take a look at the available sources for the different companies to see if it might be possible to verify these figures and provide an estimate of the current total.

23andMe
23andMe state in their media fact sheet that they have genotyped more than 800,000 customers.

The 23andMe test is sold in 56 countries of the world. However, I estimate that about 90% of their customer base is in the US. Canada and the UK are currently the only countries where the 23andMe test includes the health and trait reports.

The Genographic Project
The Genographic Project's home page states, as of today's date, that the project has 705,343 participants.

I understood that the Genographic Project kit could be purchased from any country in the world, but from the dropdown menu in their online shop it would appear that the kit is now sold in just 33 countries.

AncestryDNA
AncestryDNA confirmed in August 2014 that they had tested over 500,000 DNA customers. In a presentation given towards the end of last year by Ken Chahine, Ancestry's senior Vice President and General Manager, he stated that AncestryDNA were selling 30,000 to 50,000 DNA kits per month. If we take the middle figure of 40,000 multiplied by six that gives us a figure of 240,000 kits sold since August 2014, bringing the total up to 740,000.

The AncestryDNA test is currently only sold in America, but there are plans to launch the test in the UK, Ireland, Australia and perhaps other countries later this year.

Family Tree DNA
Family Tree DNA provide details only on the number of different types of tests taken and not the total number of customers. According to their website, as of today's date, their stats are as follows:

- 520,257 Y-chromosome DNA records in the database. The Y-DNA database includes 180,005 people who have tested at least 37 Y-STR markers. The FTDNA database also includes several thousand people who have taken the advanced BIG Y test, a comprehensive Y-chromosome sequencing SNP discovery test. FTDNA almost certainly have the largest Y-chromosome DNA database in the world with samples tested at higher resolution than in any other database.

- 190,105 mitochondrial DNA records in the database. The mtDNA database includes 47,849 people who have taken the full mitochondrial sequence (FMS) test. (This test was previously known as the FGS - full genomic sequence test). FTDNA probably have the world's largest database of full mtDNA genomes.

- The number of autosomal Family Finder tests in the FTDNA database has not been publicly disclosed. It is not clear if the 120,000 figure cited by Roberta Estes in the GenomeWeb article mentioned above is an estimate or an actual figure obtained from FTDNA staff, but the number certainly seems to be in line with my own estimates.

FTDNA sell their tests in theory to any of the 200 or so countries of the world. However, they are unable to ship to Iran and Sudan because of customs restrictions.

FTDNA have partnerships with the European company iGENEA and the Middle Eastern company DNA Ancestry & Family Origin. These partnerships have helped to bring in many non-English-speaking customers from Europe and the Middle East, but again many more who will have tested direct with FTDNA.

iGENEA kit numbers are preceded by the letter E. The iGENEA kit numbers in my mtDNA Haplogroup U4 Project go up to kit no. E17977 so it would appear that nearly 20,000 Europeans have tested through iGENEA. Many Europeans will also have tested directly through FTDNA. (It is in fact considerably cheaper to order direct through FTDNA rather than through iGENEA, but iGENEA do have the advantage of a website which is available in French, German, Spanish and Italian.)

The kits from the Middle East are preceded by the letter M. The highest kit with the M prefix that I can find in the large Arab Tribes DNA Project is kit no. 9658 so there are perhaps around 10,000 people who have tested through the FTDNA affiliate in the Middle East.

Family Tree DNA also have partnerships with a number of smaller companies such as DNA Worldwide and Jewish Voice, though these partnerships probably only account for a few thousand kits. For details on the various prefixes see the ISOGG Wiki article on Family Tree DNA kit numbers.

The international diversity of the FTDNA database can be seen in the huge range of geographical DNA projects, which are run by volunteer project administrators from around the world.

Family Tree DNA are the testing partner for the Genographic Project, and all the Geno 2.0 tests are processed in FTDNA's lab in Houston, Texas. Genographic Project participants have the option of transferring their results into the FTDNA database. Genographic Project kit numbers are preceded by the letter N. The highest Genographic Project kit number in the Haplogroup U4 Project is kit number N129937. We therefore know that around 130,000 Genographic Project customers have transferred their results to FTDNA.

Family Tree DNA are the only company who will accept autosomal transfers from other testing companies. They can accept transfers for people who have tested at both AncestryDNA and 23andMe. However, 23andMe transfers can only be accepted if the test was done on the version 3 chip which was sold between November 2011 and November 2013. Kit numbers for the autosomal transfers are prefixed by the letter B. The same prefix is also used for Y-DNA transfers from AncestryDNA and DNA Heritage. AncestryDNA no longer offer Y-STR testing. FTDNA purchased the British company DNA Heritage in April 2011. The highest B kit I can find in my projects is B39616 in the Haplogroup U4 Project, so it would appear that there are getting on for 40,000 third-party transfers in the FTDNA database. Both DNA Heritage and AncestryDNA only ever had quite small Y-DNA databases, and in any case not everyone transferred their Y-DNA results, so I would guess that the majority of the third-party transfers (perhaps in the region of 35,000) are autosomal results from 23andMe and AncestryDNA. It is not clear if the third-party transfers are included in the estimate of the size of the FTDNA Family Finder database or if these transfers are in addition to the autosomal tests processed directly by FTDNA.

It is impossible from these figures to determine precisely how many individuals there are in the Family Tree DNA database because many people who have ordered a Y-DNA test will also have gone on to order a Family Finder test and/or a mitochondrial DNA test and vice versa. The kit numbers probably provide the closest approximation of the number of people in the database. My highest FTDNA kit number is kit number 394825 in the Devon DNA Project. It may well be that the 400,000 milestone has already been passed. If we assume that there are 400,000 FTDNA kits, 130,000 Genographic transfers, 20,000 iGENEA kits, 10,000 kits from FTDNA's Middle Eastern partner, and 5,000 miscellaneous kits, we get a figure of 565,000 which is probably a reasonable estimate of the number of individuals in the FTDNA database.

Other companies
In addition to the big four companies there are a number of other smaller companies such as BritainsDNA, Oxford Ancestors and GeneBase which sell genetic ancestry tests direct to the consumer. A full list of DNA testing companies can be found in the ISOGG Wiki. However, none of these smaller companies disclose the size of their databases, and many of the people who've tested with the smaller companies have retested with one of the big four companies. I hesitate to estimate the number of people tested with these different companies but I do not think the figure can be more than 50,000 and is very likely to be much less than this.

What is the total?
To sum up, the total number of individuals tested at each of the four big companies is as follows;

Genographic Project  705,343
23andMe                    800,000+
Family Tree DNA      565,000 (DK estimate)
AncestryDNA            740,000 (DK estimate)

If we add all these figures together we get a total of 2,810,343. However, this figures makes no allowance for the significant overlap in the four databases as there are many people who have tested at multiple companies. For example, I've had my own DNA tested at 23andMe, Family Tree DNA and AncestryDNA. We can subtract the 130,000 people who have transferred their Genographic results to FTDNA and we can perhaps estimate that about 35,000 people have transferred autosomal DNA results to FTDNA.  That brings the total down to 2,645,343. There is probably more overlap than I've allowed for, but it does seem very likely that there are currently around two and a half million people in the world who have paid for a DNA test with the big four companies. It will be interesting to see what these figures look like this time next year.

© 2015 Debbie Kennett

6 comments:

Anonymous said...

Your calculations appear to be based on the assumption that every customer number has been issued (and returned a sample). On this basis my bank has about 2 billion customers!! It would seem that the answer to your question of whether any of these claims can be verified is 'No'?

Debbie Kennett said...

The company claims can't be verified as no company is going to open up its database for an independent audit but I thought it was still worth checking sources and trying to come up with a reasonable estimate. I think it's unlikely that the big companies would misrepresent the size of their databases as this would mislead customers and I would imagine they would fall foul of consumer protection laws. It's true that not all the FTDNA kit numbers represent kits where the customer has actually returned the sample and had the order processed. Some project admins order lots of kits in sales but allocate the kits many months or even years later. Some people order a kit but then never pay for it and so the order is never processed. I have one project member who paid for a kit about five years ago but still hasn't got round to doing his test. However, in the overall scheme of things these numbers will be very small and the FTDNA kit numbers are still probably the best indicator of the number of individuals in the FTDNA database.

Charles said...

Thanks for doing the legwork on this one. The size of the DNA database is everything when it comes to selecting a DNA test service. Not everyone has the budget for all 4.

Do you know of anyone else (in the industry) who's keeping tabs on the # of records in the big 4 providers?

Debbie Kennett said...

Charles, Sorry I don't know of anyone else who has attempted to estimate the size of the different databases.

Terry Breverton said...

Thanks for this - brilliant - it was passed to me by a good (and learned)friend with a background in genetics, science and statistics. I'm really worried about the BBC - it used to be impartial but now if I stub my toe it's down to climate change. I became interested in genetic testing via writing upon the early British, and more recently in my book on Richard III.

Debbie Kennett said...

Thanks Terry. Are you referring to my posts about the DNA Cymru programme?

For once the BBC, unlike S4C, seem to have learnt their lesson and they've published a critical story about the programme:

http://www.bbc.co.uk/cymrufyw/31708205

DNA testing does have many legitimate applications and the case of Richard III is an excellent example. However, people do tend to read more into DNA evidence than they should. I wrote about Richard III's DNA here:

http://cruwys.blogspot.co.uk/2014/12/richard-iii-and-use-of-dna-as-evidence.html

It's worth reading the full scientific paper including all the supplementary data.