Cruwys news: March 2014

Sunday, 23 March 2014

Have AncestryDNA discontinued their Y-STR and mtDNA tests?

It would appear that AncestryDNA have stopped selling their Y-STR and mtDNA tests. The website now shows that the tests are out of stock and visitors are directed to the landing page for the new AncestryDNA autosomal DNA test.

I spoke to an Ancestry representative at Who Do You Think You Are? Live last month to enquire what they were planning to do about their Y-DNA and mtDNA tests. I was told that they had at one time considered phasing them out but that they still regularly receive orders every week from a few projects. They have now supposedly set up the system to make the Y-DNA tests easy to find for those who need them but difficult for everyone else to discover. However, it would appear that even if you have a Y-DNA project at AncestryDNA, there is currently no way to order a kit. It is not clear if the Y-DNA and mtDNA tests have been permanently discontinued or if this is a temporary problem.

Ancestry have not been actively marketing their Y-DNA and mtDNA tests for some time now and have focused instead on their new autosomal DNA product. Probably about 95% or more of surname projects are hosted at Family Tree DNA, but there are a few surname projects which have persisted against all the odds with AncestryDNA. AncestryDNA acquired Relative Genetics in 2008 and some of the old Relative Genetics surname projects were transferred to AncestryDNA. Family Tree DNA bought out the British company DNAHeritage in April 2011, and these projects were given the opportunity to transfer their results free of charge to FTDNA, but a handful of projects decided to move their results to AncestryDNA instead. However, even if there is a project at AncestryDNA, there will inevitably be a complementary or rival project at Family Tree DNA.

I've never been a great fan of the AncestryDNA Y-STR and mtDNA tests. One of the biggest problems was that the company provided no facility for SNP testing. Y-DNA haplogroups can often be predicted with a high degree of confidence from a Y-STR haplotype, but this becomes much more problematical with a rare haplotype. There have been a number of reported cases of incorrect haplogroup predictions from AncestryDNA which were only discovered when the customers tested elsewhere. There are probably many other people sitting in the AncestryDNA database blissfully unaware that they have been assigned to the wrong haplogroup. There have been similar problems with the Ancestry mtDNA haplogroup predictions. The AncestryDNA mitochondrial DNA test sequences most of the hypervariable (non-coding) region of the mtDNA genome. While it is often possible to predict the mtDNA haplogroup from HVR results, sometimes there is ambiguity and it is necessary to test SNPs from the coding region for confirmation. Ancestry do not have any facility to upgrade mtDNA test results or to order SNP testing to confirm the mtDNA haplogroup. Other companies do offer mtDNA haplogroup backbone tests or include some coding region SNPs in the cost of the test. Family Tree DNA include with their mtDNAPlus test (HVR1 + HVR2) a free mtDNA haplogroup backbone test which covers a panel of 22 coding region SNPs to confirm the haplogroup. FTDNA are the only company to offer a standalone full mitochondrial sequence test for a detailed haplogroup assignment and matches within a genealogical timeframe. GeneBase also offer a full sequence test but it is only available as an upgrade and costs almost twice as much as the FTDNA test.

In addition to the haplogroup problems Ancestry provided very little in the way of support for surname projects. The interface was very primitive and very hard to use and no improvements have been made since the service was launched back in October 2007. Ancestry have only ever offered two basic Y-STR tests for 33 markers and 46 markers. They inflate their marker count by including three markers for which the majority of the population will have a null value. Other companies report values for these markers if found but don't routinely include them in the marker count.

The Ancestry marker panels are also supposedly less useful for genealogical purposes because they have a slower mutation rate. The genetic genealogist John Robb has done an interesting study comparing the mutation rates of the AncestryDNA and Family Tree DNA Y-STR markers which can be found here:

www.johnbrobb.com/Content/DNA/MarkerPanelsCompared.pdf

There are occasions when it is necessary to order additional markers in order to subdivide family groupings or if one has a large number of matches. Unfortunately Ancestry did not have any facility to order extra markers. Other companies offer upgrades to 67 markers, and Family Tree DNA even offer a 111-marker test.

However, Ancestry did have one advantage over Family Tree DNA because they have always reported microalleles (fractional marker values such as 14.2). While microalleles are rare, they can be genealogically useful as the Acree surname project have found to their benefit. It is very much hoped that Family Tree DNA will eventually provide the facility to report microalleles.

If you have taken a Y-STR test with AncestryDNA I recommend that you transfer your results to Family Tree DNA. The basic transfer costs US $19, and once your results have been transferred you will have the option to upgrade for an additional fee so that you can be included in the matching database. For further details see the third-party transfer section in the FTDNA Learning Center:

www.familytreedna.com/learn/transfer-y-dna-testing-results/

For comparisons of the Y-STR tests available from the different companies see the Y-STR testing chart in the ISOGG Wiki:

www.isogg.org/wiki/Y-DNA_STR_testing_chart

There is also an mtDNA testing comparison chart in the ISOGG Wiki:

www.isogg.org/wiki/MtDNA_testing_comparison_chart

Thanks to Joss Ar Gall and Charles Acree for telling me about the unavailability of the AncestryDNA tests.

Update 23 February 2014
Stephanie Ray has kindly sent me the following link which can apparently still be used to order a Y-DNA test from AncestryDNA:

http://ldna.ancestry.com/buyKitParticipant.aspx?goal=p

This page does not appear to be linked from anywhere within the AncestryDNA pages. It also transpires that it is no longer possible to access the group results from the groups menu. However, you can access the results by using this link:

http://ldna.ancestry.com/groupDNA.aspx?siteId=57076542

The link only works when you are logged into your own Ancestry account. The number after the equals sign needs to be replaced with your own Ancestry group ID. You can find the group ID by going to the groups menu and going to the home page for your group.

Update 24 February 2014
Charles Acree has advised me that he has tried to ring AncestryDNA to clarify the current situation. He tells me that the Ancestry.com reps are not accepting any Y-DNA or mtDNA orders at all. They are "portraying it as a temporary situation (not a problem)" and they've added that "there are no new kits on order". I do not know of anyone who has tried to order a Y-DNA or mtDNA test from AncestryDNA, but Charles was told that even if an order were placed a kit would not be sent. He was further told that the link to the misleading webpage that Stephanie provided is a glitch that will eventually be corrected.

Update 6th June 2014
AncestryDNA have now announced that they are discontinuing their Y-DNA and mtDNA tests with immediate effect. For further details see my blog post Ancestry.com announcement regarding discontinuation of Y-DNA and mtDNA tests.

Tuesday, 11 March 2014

Who Do You Think You Are? Live 2014

Who Do You Think You Are? Live is now firmly established as the biggest event in the family history calendar in the UK, and it always provides a welcome opportunity to meet up with friends and colleagues and make new connections. This is my seventh year at WDYTYA. I've been to all of the shows apart from the very first one in 2007. There was a departure from the usual schedule this year and for the first time WDYTYA was held on Thursday, Friday and Saturday rather than Friday, Saturday and Sunday. In previous years Sunday has always been the quietest day, perhaps because of the difficulties of Sunday travel on public transport. The change of days seems to have paid off. The attendance this year was 13,128, slightly down on last year's figure of 13,941, but the footfall was spread evenly across the three days. Here are the figures for comparison:

2014

Thursday 20 February 4,253

Friday 21 February 4,353

Saturday 22 February 4,522

2013
Friday 22 February 5,444
Saturday 23 February 5,365
Sunday 24 February 3,132

In another change this year ISOGG (the International Society of Genetic Genealogy) stepped in to co-ordinate the lecture programme for the DNA workshop which is sponsored by Family Tree DNA. I worked with Maurice Gleeson and Brian Swann to put together the programme. For the first time we invited some speakers from the world of academia to complement the genetic genealogy talks. I believe we came up with a good mix of speakers, and the talks were all very popular and very well received. The abstracts of the talks and the speaker biographies can be seen here. Maurice recorded all the lectures and they are gradually being uploaded to the DNA Lectures - WDYTYA Live 2014 YouTube channel, though only if the speakers have given permission. The open plan speaking area was not ideal and there is quite a lot of background noise but, for those who were unable to attend in person, it is the next best thing. I was presenting two talks this year, both of which will eventually be on YouTube.

The DNA workshop area at WDYTYA Live is always very busy. We thought that last year was exceptional because of all the publicity from Richard III, but this year the interest in DNA testing was even greater. The Family Tree DNA stand was constantly busy throughout all three days of the show, and at at times it seemed as though the entire population of London had descended on Olympia to have their DNA tested. We had to implement a triage system. This involved volunteers talking to people in the queue to answer any questions that they might have and to ensure that, if they did want to have their DNA tested, by the time they sat down to be served they knew exactly which test or tests they wanted and what they might expect. Queries from people who had already tested and wanted help with the interpretation of their results were referred to the helpers on the ISOGG stand. Other people were sent away with literature so that they could read up on the subject and come to a decision. The DNA testing frenzy reached its peak at lunchtime on the Saturday. At one point there were over 20 people in the queue, which had to be subdivided into a swabbing queue and a triage queue. It almost seemed as though a mass hysteria had gripped Olympia, or perhaps it was just the British love of queuing and people didn't want to miss out! Family Tree DNA sold all the kits they brought with them, despite having brought many more kits than last year, and extra supplies had to be brought in from stock held in the UK by a project administrator. Nevertheless, by about 3.30 pm on the Saturday, all the kits had been sold. People were still able to place orders, and it was arranged that the kits would be posted to them free of charge. There seemed to be far more women than men testing which is perhaps not surprising considering that more women than men attend WDYTYA. The £35 mtDNAPlus test seemed to be particularly popular and there were lots of Family Finder tests sold. Some women were buying Y-DNA tests to take away for their male relatives. All in all there were nearly 500 kits sold, an all-time record, and everyone can look forward to lots of new matches in the FTDNA database in a few months' time.

I spent most of my time at WDYTYA on triage duty, and by the end of the three days I was feeling somewhat hoarse, but it was fascinating listening to people's stories as I chatted to them in the queue. I spoke to one lady who has Basque ancestry on all her lines going back to the 1500s. Another lady had come along to the show to get a kit for her father who was in his nineties. One gentleman had come over from Germany especially for the show, because there is no German equivalent of WDYTYA. I heard that there was one gentleman who had been given just two months to live, but he had come along to WDYTYA to get his DNA tested to make sure that it was preserved in the database as a legacy. His test is being expedited by FTDNA and I do hope that he lives long enough to receive his results.

I paid a brief visit to the Ancestry stand and took the opportunity to ask them if they had any intention of introducing their new autosomal DNA test in the UK. I was told that they are hoping to start selling it in the UK and a number of other countries in the first quarter of 2015. They are looking at getting the the DNA testing itself done somewhere in the UK. There are plans to introduce some sort of filter based on networking which would help solve the problem of finding the useful British matches amongst the large number of Americans in the database. The filter should recognise that people would be more likely to have meaningful matches with people in their own country and these would appear at the top of their list. There do not seem to be any plans to introduce a chromosome browser. Ancestry recognise that experienced genetic genealogists would like a chromosome browser but they think that most people would not now how to use one. They are working on an alternative solution, and it will be interesting to see what they come up with. I also asked what was happening with their Y-DNA tests which are now very hard to find on the website. I was told that they had at one time considered phasing them out but they still regularly receive orders every week from a few projects. They've now tried to set up a system so that the Y-DNA tests are easy to find for those who need them but difficult for everyone else to discover. BritainsDNA also had a stand at WDYTYA. I walked past their stand a few times but it never seemed to be very busy. Their tests are very expensive compared to the offerings from Family Tree DNA and I think they had a hard time competing.

I met up briefly with Peter Calver of Lost Cousins to discuss the arrangements for my forthcoming talks for the Genealogy in the Sunshine conference in Portugal. I had a brief chat with Jane Taubman and Simon Orde on the Family Historian stand. I met up briefly with Princess Maria Sviatopolk-Mirski, but got dragged to answer more questions about DNA testing. I paid a visit to the History Press stand, where I was greeted by a lady who just about to buy a copy of my Surnames Handbook! I was able to sign a copy of the book for her. While I was there I signed a few copies of DNA and Social Networking for the publishers to use, though I'm still bemused as to why anyone would want my scrawl in a book! They'd apparently sold quite a few copies of both books as a result of my talks.

I was particularly pleased to have the chance to meet Tom Bromwich, my third cousin once removed, who was attending the show with his parents, and is the youngest family history researcher that I know. Tom started his family history research at a very young age and seems to do most of his research in the school holidays. He's a very careful and meticulous researcher and has already made good progress with his family tree. I would have taken a photo but I got summoned away to talk to a TV production crew, who wanted some advice on DNA testing for a forthcoming TV programme.

We were so busy that I only managed to escape to go to a handful of talks. The highlight for me was Chris Stringer's fascinating talk on the early peopling of the British Isles. I had the chance to talk him briefly afterwards and I was interested to learn that the Natural History Museum is thinking of re-testing Cheddar Man. The original DNA testing was done many years ago by Professor Bryan Sykes, but the research was never published in a peer-reviewed journal. A lot of the early ancient DNA research is now somewhat suspect, and it is thought that sample has probably been contaminated by modern DNA.

I also enjoyed John Rowlands talk on "The perpetual incognito of being a Jones: overcoming problems with surnames in Wales", the content of which was based on material from the newly published revised edition of The Surnames of Wales, written by John with his wife Sheila. This book is the bible for anyone studying Welsh surnames, and the new edition benefits from much new material and many new maps. I was also very pleased that I finally had the chance to meet Sheila as we have corresponded a lot over the last couple of years and become good e-mail friends.

On Saturday I attended Bruce Winney's talk on the People of the British Isles Project. The good news is that within any luck the long-awaited paper with all the wonderful maps should be published very soon. The authors have been dealing with referees' comments and should by now have resubmitted the paper.

There have been a lot of rumours flying around about the future of Who Do You Think You Are? Live and there has not been any official announcement from the organisers, but it appears that the show will either not be held at Olympia next year or will be held on different dates. I am told that none of the exhibitors have as yet been given the chance to renew their bookings for 2015. The nearby Earls Court arena is being knocked down to make way for new housing, and exhibitions normally held at Earls Court are moving to Olympia. It has been suggested that the show might be held at the Excel Convention Centre, the NEC in Birmingham or even in Manchester. Although it would be great to see a version of WDYTYA Live in other parts of the country, I think there is always going to be a need for a large family history show in London. No doubt we will hear something soon.

The good news is that it has been announced that Who Do You Think You Are? Live is coming to Scotland this year as part of Homecoming Scotland 2014. The Scottish WDYTYA will be held at the Scottish Exhibition and Conference Centre in Glasgow from 29th to 31st August. Unfortunately I will not be able to attend as the event clashes with the Essex Society for Family History's 40th anniversary conference in Basildon. I am one of the guest speakers at this conference and will be hoping to enlighten the audience about the mysteries of DNA testing.

A number of other bloggers have written reports from WDYTYA Live. Emily Aulicino has done a nice write up with lots of photos of all the genetic genealogists on her Genealem blog. Jo Tillin, a fellow member of the Guild of One-Name Studies has done a great job providing a round-up of all the other blog posts from this year's Who Do You Think You Are? Live which you can find on her Full Circle Family History blog.

I leave you with a selection of photos from the show. Click on the images to enlarge them. Enjoy!

The DNA workshop schedule.

Maurice Gleeson explaining to a captive audience how to analyse
autosomal DNA test results.

Chris Stringer of the Natural History Museum gave a fascinating talk on
human origins. Here he discusses the discovery of the 800,000-year-old
footprints found in Happisburgh, Norfolk.

I get to meet Chris Stringer.

Connie Fisher of Sound of Music fame talking to Max Blankfeld of
Family Tree DNA. Photograph courtesy of Max Blankfeld.

Emily Aulicino telling a full house about her autosomal DNA success stories.

The crowds on the Family Tree DNA stand.

Triage in action on the FTDNA stand. Photo by Joss ar Gall.

At times the queue for DNA testing got so long that it wound right round
the corner to the next stand. The people at the front of the queue are patiently
waiting to be swabbed. Triage is in operation at the back of the queue.

The packed Family Tree DNA stand. There was standing room only for the
lecture in the DNA workshop. Photo by Joss ar Gall.

Sue and Anne on the ISOGG stand with the poster in the background
with the list of surnames for which sponsored DNA tests were available.
Photo by Joss ar Gall.

Richard answers questions on the ISOGG stand. Photo by Joss ar Gall.

There was standing room only for most of the DNA lectures.

A quieter moment at the end of the day but all four seats are still occupied with
FTDNA customers having their DNA tested. The organisers had to come and
tell FTDNA to stop selling so that they could shut the hall up for the night!

An exhausted but happy team of genetic genealogists with Bennett
Greenspan and Max Blankfeld of Family Tree DNA at the end of the show.
Photo by Joss ar Gall.

Bennett and Max shared with all the volunteers a bottle of champagne and
wine that had been kindly provided by the organisers. We were promptly
ticked off by an official for breaking the health and safety regulations by
drinking alcohol during "take down" - the time when all the stands are
dismantled - but by then all the alcohol had gone!

Friday, 7 March 2014

More pseudoscience from Alistair Moffat on the BBC

It is ironic that on the very day it was announced that the BBC had upheld a complaint about a misleading interview given by Alistair Moffat on the BBC Radio 4 Today programme, the BBC decided to give him yet another opportunity to promote BritainsDNA, his genetic ancestry testing business. His latest interview was on yesterday’s edition of the Mark Forrest show on BBC Local Radio. You can listen to the interview for the next six days on the BBC iPlayer. Here is the direct link:

http://www.bbc.co.uk/programmes/p01s6pt4

The interview starts at around 2 hours four minutes and thirty seconds.

Once again the for-profit nature of Britains DNA is disguised. Alistair Moffat is introduced as “a historian and the managing director of BritainsDNA, a project set up to map DNA across the British Isles”. Although Moffat does make it clear that people have to pay for the DNA tests he gives the false impression that all the profits are ploughed back into the company for research purposes: “What we do when people pay for a test is we plough what we get from customers back into research.” We have yet to see any "research" from BritainsDNA published in a peer-reviewed scientific journal.

The interview is full of inaccurate statements and misleading claims. Here are some examples:

“What we have discovered, Mark, is that Viking blood still runs very deep in Britain. We’ve done research recently where we’ve looked at the [Y-chromosome] DNA of 3500 men and we think that almost a million men in Britain – one in every 33 British men – can claim to be the direct male-line descendants of the Vikings. And it’s extraordinary that that is so clearly present in the modern population.”

“We can tell when it [the Y-chromosome marker] arose… where it arose, and we can sometimes track its movement.”

“I have Scandinavian DNA, and it comes from Northern Denmark and from Norway so I’m a Viking and I know that because I did a test which looked at my Y-chromosome and it was able to track it back to Scandinavia and because it was attached to a historical event I’m pretty sure that I came over with the Vikings.”

“If you have Viking DNA we can tell you.”

“I haven’t got much hair and I’m not blond but I’m still a Viking.”

“My mitochondrial DNA from my mum is from Pakistan 30,000 years ago – quite remarkable – and her ancestors made this extraordinary trek across the face of the earth to get to Scotland from Pakistan.”

All of the above is of course complete nonsense. It is not possible to tell where any specific “marker” arose thousands of year ago simply by testing the DNA of living people. We can get a good idea of the present-day distribution of Y-chromosome and mtDNA lineages but the present-day location of a lineage does not necessarily correlate with its distant origins.

For further information on the reasons why we cannot make these extrapolations from Y-chromosome and mitochondrial DNA tests see the Understanding genetic ancestry testing page on the UCL website.

The BBC have either wittingly or unwittingly given Alistair Moffat and his BritainsDNA testing company a huge amount of free publicity in the last few years, and have failed to give any independent geneticists the opportunity to counter his ludicrous stories. See the PR attack on the BBC page on the UCL website to understand the full scale of the problem.

Update May 2015
The BBC have finally redeemed themselves and have produced an excellent documentary on Radio 4 introduced by Dr Adam Rutherford entitled "The Business of Genetic Ancestry". Some of the misleading claims from BritainsDNA are examined in the programme.

Related blog posts
- Alistair Moffat, BritainsDNA and the BBC - a "uniquely British farce"
- BritainsDNA, the BBC and Eddie Izzard
- The British: a genetic muddle by Alistair Moffat
- BritainsDNA, The Times and Prince William: the perils of publication by press release

Thursday, 6 March 2014

Alistair Moffat, BritainsDNA and the BBC - a "uniquely British farce"

After a prolonged and frustrating complaints process, the BBC has finally upheld a complaint brought by my colleague Professor David Balding of University College London (UCL) about the now infamous radio interview on the Today programme between Jim Naughtie and Alistair Moffat, the Managing Director of BritainsDNA and the current Rector of St Andrews University. The interview was deemed to be in breach of the BBC’s guidelines on both "accuracy" and "product prominence". Fraser Steel, Head of Editorial Complaints, writing on behalf of the BBC, conceded that Alistair Moffat “spoke in terms which either went beyond what could be inferred with certainty from the evidence or were simply mistaken” and that “some of the terms used on this occasion conduced to an exaggerated impression of what was possible”. He considered that “the programme-makers should have done more to guard against this”.

With regards to the issue of product prominence Mr Steel concluded: "it seems to me that Mr Moffat’s statement that 'we subsidise it massively' may have contributed to an impression that it [BritainsDNA] was a disinterested research study (an impression which Mr Naughtie’s description of the company as a 'DNA database' and this reference to 'people who give their DNA for the project' would have done nothing to dispel)... it seems to me that the reference to the website amounted to undue prominence for what is in fact a commercial organisation..."

The BBC have promised to put a summary of the outcome of the complaint on their Complaints Website, together with the actions they propose to take in response to the finding. We have been informed that this is the responsibility of the News Department, and that the summary and actions should be published within the next couple of weeks. In the meantime there is a brief account of the story in the latest issue of Private Eye (No. 1361, 7 - 20 March 2014, p13). (Update: The summary of the upholding of the complaint was finally published on the BBC's Editorial Complaints Unit's website on 15th April 2013 and can be found at: http://www.bbc.co.uk/complaints/comp-reports/ecu/today9july2012radio4. A summary has also been provided as a Correction and Clarification.)

Although not disclosed by the BBC in the Today interview, Alistair Moffat and Jim Naughtie are old friends. Jim Naughtie publicly endorsed Alistair Moffat's bid to become Rector of St Andrews. The issue of this conflict of interest is still under investigation by the BBC but is being handled by management in the BBC News Department. David Balding was advised on 20th February that he can expect a response within 20 working days. Somewhat surprisingly, Mr Steel advised that Jim Naughtie was "unaware of the financial structure of BritainsDNA at the time of the interview", but even if Naughtie did not know of the commercial interests there seems to be no excuse for his failure to ask more probing questions in response to his friend's ludicrous claims.

However, the most troublesome aspect of this whole affair has been Alistair Moffat’s use of legal threats in an attempt to silence legitimate criticism and stifle public scientific debate. Professors David Balding and Mark Thomas at UCL wrote privately to the then BritainsDNA scientists expressing their concerns about the Today interview. They were subsequently the recipients of a threatening letter from Alistair Moffat's solicitor, but bravely held their ground and eventually went public with their concerns, after failing to get a satisfactory response to private e-mails. Students writing for The Saint, the St Andrews University student newspaper, were similarly intimidated by threats to sue when they tried to cover the events, but they courageously ignored the threats and went ahead and published their story. Although much of the affair is already in the public domain, the full facts have not been revealed. Now, to coincide with the upholding of the BBC complaint and for the sake of transparency and public interest, a new UCL website has been launched which documents the events in full and provides links to all the relevant correspondence, including all the legal threats and the complaints to the BBC. The website can be found here:

www.ucl.ac.uk/mace-lab/genetic-ancestry

I hope that anyone else who has been similarly intimidated by threatening legal letters will take inspiration from this case and will be encouraged to stand up for their principles.

It is interesting to note that this is not the first time that Alistair Moffat's attempts to take legal action have backfired on him. In 1999 he lost a £25,000 defamation case that he brought against the West Highland Free Press. He objected to being described as ''the Laird o' Coocaddens' in-house bully'' in the newspaper's diary column. The judge "did not accept that the article... was attacking Mr Moffat's private character or business reputation, or that the words were capable of being read that way" and he dismissed the action.

Nature memorably described the Moffat/UCL case as “a messy and perhaps uniquely British farce”. The affair highlighted the antiquated English libel laws which, rather than protecting the interests of society, had the effect of restricting free speech and suppressing academic debate. Following nearly five years of campaigning by the Libel Reform Campaign, Sense About Science, and other organisations and individuals, a new Defamation Act came into force in England and Wales on 1st January 2014. Although it remains to be seen how the new law will be interpreted in practice, it seems likely that it will have the effect of restricting such trivial and vexatious claims. If the new Defamation Act had been in force at the time of the Moffat/Naughtie interview it is quite possible that the whole sorry saga would never have happened.

The new UCL website also highlights some of the problems with the haplogroup stories provided by BritainsDNA, but it should be noted that BritainsDNA is not the only genetic ancestry company providing misleading stories. Furthermore, there have been many papers published in the peer-reviewed scientific literature which make similar subjective and unsubstantiated claims about the origins of Y-chromosome and mitochondrial DNA haplogroups. Advances in ancient DNA testing and the new next-generational sequencing tests, which will provide ever-greater resolution of the Y-chromosome and mitochondrial DNA trees, will no doubt expose the deficiencies in previously proposed hypotheses. It is perhaps time for a wider scientific debate on the legitimate inferences which can be made from deep ancestry tests.

Related blog posts
- More pseudoscience from Alistair Moffat on the BBC
- BritainsDNA, the BBC and Eddie Izzard
- The British: a genetic muddle by Alistair Moffat
- BritainsDNA, The Times and Prince William: the perils of publication by press release

Monday, 3 March 2014

The case of Moulay Ismael - fact or fancy

Image courtesy of Wikimedia Commons.

The Moroccan ruler Moulay Ismaïl Ibn Sharif (1634? or 1645? – 1727), also known as Moulay Ismael the Bloodthirsty or the Warrior King, is believed to hold the world record for the highest number of offspring for any man throughout history, but the facts are a matter of some debate. A contemporary report from 1704 records that Moulay had 600 sons by four wives and 500 concubines. Daughters by his four wives were allowed to live, whereas daughters born by his concubines were suffocated by the midwives at birth. This results in approximately 1171 children from 500 women in a reproductive time span of 32 years (25–57). A new scientific paper using computer modelling has attempted to determine whether such a feat was actually possible. Even using more conservative assumptions the authors concluded that the Emperor's reproductive success was plausible, but he would have had to have sex every day for thirty-two years. The authors do not seem to have made any allowances in their simulations for multiple births. They have also not taken into account his reproductive history before he became emperor as they consider that he would probably not have had a comparable harem by then. I'm not aware of any projects focusing on Moroccan Y-chromosome DNA, but it would be very interesting to see if there is a legacy of the emperor's reproductive success in the DNA of living males in Morocco today. If any males are interesting breaking the record, you might like to know that a breeding pool of between 65 and 110 women in your harem leads to the maximum reproductive outcome!

Here is the abstract from the paper:

Textbooks on evolutionary psychology and biology cite the case of the Sharifian Emperor of Morocco, Moulay Ismael the Bloodthirsty (1672–1727) who was supposed to have sired 888 children. This example for male reproduction has been challenged and led to a still unresolved discussion. The scientific debate is shaped by assumptions about reproductive constraints which cannot be tested directly—and the figures used are sometimes arbitrary. Therefore we developed a computer simulation which tests how many copulations per day were necessary to reach the reported reproductive outcome. We based our calculations on a report dating 1704, thus computing whether it was possible to have 600 sons in a reproductive timespan of 32 years. The algorithm is based on three different models of conception and different social and biological constraints. In the first model we used a random mating pool with unrestricted access to females. In the second model we used a restricted harem pool. The results indicate that Moulay Ismael could have achieved this high reproductive success. A comparison of the three conception models highlights the necessity to consider female sexual habits when assessing fertility across the cycle. We also show that the harem size needed is far smaller than the reported numbers.

The scientific paper by Elisabeth Oberzaucher and K Grammer in PLOS ONE (February 14, 2014 DOI: 10.1371/journal.pone.0085292) can be found here.

Update 15th April 2016
A new Morocco DNA Project has been established at Family Tree DNA. The project accepts Y-DNA, mtDNA and Family Finder results.

Saturday, 1 March 2014

The BIG Y roll out – the SNP tsunami is on its way!

The genetic genealogy community has been eagerly anticipating the arrival of the so-called SNP tsunami for several months and it now seems that the first waves are starting to appear on the horizon. I was one of a select few genetic genealogists and bloggers who was invited to participate late on Thursday afternoon (UK time) in a private webinar led by Dr David Mittelman, Family Tree DNA’s Chief Scientific Officer, in preparation for the rollout of the first results from FTDNA’s next-generation sequencing BIG Y test.¹ During the webinar we were given a sneak preview of some sample results from the test and we had the opportunity to ask lots of questions. I don't know what it says about me and my enthusiasm for Y-SNP testing but I seemed to be the one asking most of the questions! I am very excited about the implications of comprehensive Y-chromosome sequencing. These tests will not only allow us to define the exact branching within each haplogroup but will also reach right down into genealogical time and will eventually make it possible to delineate recent branches of the Y-line and identify the common ancestor almost down to the exact generation.

Background
There are almost 60 million base pairs in the Y-chromosome but about half of it is full of repeating complexities which have yet to be deciphered. There are only around 20 million or so bases which are good candidates for sequencing.^{2, 3} The BIG Y test was designed to provide the most information at the most affordable price. The intention is also to provide information in the most clear and easy-to-use way.

There seems to have been some confusion about how much of the Y-chromosome is sequenced for the BIG Y test so I asked Dr Mittelman for clarification. He advised that the test sequences around 13.5 million bases on the Y-chromosome and provides results for between 11.5 and 12.5 million positions. It is not possible to give a precise figure because NGS results vary from person to person. This is an improvement on the spec that was advertised when the pre-sale was announced in November when a figure of 10 million bases was quoted.

When the BIG Y pre-sale was announced the coverage was advertised as 60x (the number refers to the number of times the Illumina machines read the sequence – the more reads the better). The information on the BIG Y FAQ page has since been updated and the coverage is now being advertised as “55x to 80x average coverage”.

The roll out
The BIG Y tests have been processed in the order in which they have been received, but some people had to supply new DNA samples so their tests will take longer. The first 100 results were released on Thursday 27th February, and there will be a gradual roll out of results running through to the end of March. We had been expecting all the BIG Y results to be released on the same day but it now appears that the anticipated tsunami will be more of a steady trickle of waves – a slow-motion tsunami⁴ – rather than one giant flood of data. The following message is now being displayed on the personal pages of people who are awaiting their results:

"We expect that all samples ordered during the initial sale (last November & December) will be delivered by March 28th. We are processing samples in first come first serve order. If a sample doesn't pass quality control, we will place it in the next set of results to be processed as long as we have enough DNA sample. If we require an additional sample, we will send a new test kit and place the new sample in the first set to be processed when it is returned."

My dad is one of the people waiting his results but I did not place the order until the very end of the pre-sale period so his results will probably be amongst the last to be processed. Along with other people who have ordered the BIG Y test I received an e-mail this morning from Nir Leibovich, FTDNA's Chief Business Officer, apologising for the delay. He advised: "The entire FTDNA team has been working very hard over the last few months with high determination and many late nights. Launching a new product is always a challenge with many moving parts, some more predictable than others. Unfortunately we ran into some surprises beyond our control when one of our suppliers ran out of certain reagents we needed for running the Big Y product... We hope you will let the wonderful product we produced make up for delays that were needed to refine it! We have updated expected results dates on customer pages and will work around the clock to beat them." [Click here to read the full text of the e-mail.]

How many BIG Y tests have been ordered?

I asked if we could be given an idea of the number of BIG Y tests ordered. Although a precise figure was not revealed we were told that there had been "thousands" of orders and that "FTDNA have more Y than anyone else". I know that large numbers of orders have gone through some of the haplogroup projects. There have been 149 orders in the R1b-U016 Project alone and around 340 orders in the R1b-L21 Project. If you have ordered the BIG Y test do make sure you join the relevant haplogroup project so that the very helpful and knowledgeable volunteer admins can help you to understand your results. There is a list of Y-DNA haplogroup projects in the ISOGG Wiki:

www.isogg.org/wiki/Y-DNA_haplogroup_projects.

What is reported
Screenshots of the user interface and explanations of the various features can be seen on the BIG Y page in the FTDNA Learning Center:

www.familytreedna.com/learn/user-guide/other-test-results/big-y-page

FTDNA have a big internal SNP database with details of 36,562 known SNPs. Customers will be given a list of their results for all the SNPs in the database. They will be told whether they are ancestral or derived for each position, whether or not the SNP is on the tree, the genome reference co-ordinates, their genotype (their DNA letters) and the confidence rating.

There are three confidence levels for the SNP calls. High confidence means that all the reads essentially agree. Medium confidence means that the information looks good but it has to be manually curated. Low confidence indicates noisy data.

NGS coverage varies from person to person but it is expected that results will be provided for between 25,000 to 35,000 known SNPs per person. The amount of overlap with the tests from Full Genomes, Geno 2.0 and Chromo 2 is not yet known, but it is expected that the BIG Y will cover 90% of the SNPs in the Geno 2 and Chromo 2 tests. There are a handful of people in the genetic genealogy community who have tested with all four companies. Some people have also taken the Walk Through the Y test, the previous SNP discovery test from FTDNA which utilised Sanger sequencing. Once the BIG Y results have all been released and compared with the other tests the haplogroup project admins will be able to provide better information on the overlap between all the tests.

Customers will also be given a separate list of novel variants. These are defined as variants which differ from the reference sequence and which are not seen in the FTDNA SNP database. Thankfully the genome reference co-ordinates will be provided which will allow comparisons with SNPs identified in tests from other providers (with the exception of BritainsDNA who have not released the co-ordinates for their new S series SNPs [see my update from 4th March below]). Dr Mittelman does not yet know how many novel SNPs to expect per person. There is currently no function to compare novel variants in the database, but the test is very much a work in progress and he is open to suggestions for new ideas.

Information will not as yet be provided on INDELS (insertions and deletions), but experienced users will be able to extract the information from the raw data.

File formats
Two types of files will be provided: a VCF file and a BED file. These files are not currently available but should be ready for download some time next week.

The VCF (variant call format) file will consist of a list of all the variants identified, tagged by confidence and location. This is essentially a file showing all your differences from the reference sequence. For an explanation of the file format see the paper by Danecek et al (2011).⁵A sample VCF file can be found in the 1000 Genomes Wiki:

www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

The BED file is a text file which will provide a bunch of ranges for all the areas where information is available for which it was possible to make confident calls. This file will cover all the positions that passed quality control. A useful guide to BED files can be found here:

http://genome.ucsc.edu/FAQ/FAQformat.html#format1

Information about the VCF and BED file formats will be added to the BIG Y Learning Center page in due course.

The raw data files in the form of BAM/FASTQ files will also be made available in due course but a decision needs to be made on the best way to provide the data. I imagine that the data will almost certainly be made available in the cloud, perhaps taking advantage of the new Google Genomics service, or another similar application.

Single SNP testing
The value of a DNA test is in the comparison process and the BIG Y test is no exception. It is hoped that large numbers of new SNPs will be discovered, many of which will be in a genealogical time frame. Ideally a paired testing strategy should be adopted with two very distantly related men from the same subclade taking the test. If novel SNPs are found which identify particular family groups then in theory it should be possible to order single SNPs. Single SNPs can be ordered either direct from Family Tree DNA or from Thomas Krahn’s new company YSEQ. The two companies offer a complementary range of SNPs. Single SNPs cost $35 each from YSEQ and $39 each from FTDNA. However, I suspect that if you are able to identify a SNP in the last two hundred years or so that is only likely to be shared by half a dozen men it will not be cost-effective for any company to offer a single SNP test. Much will also depend on the number of new SNPs identified in a given tree. It might well turn out to be more economical for a surname project to club together and pay for BIG Y tests for project members representing branches of the tree that are of particular interest.

There were some misleading reports emanating from the FTDNA group administrators' conference in Houston last November which suggested that FTDNA had an upper limit of 2000 on the number of new SNPs on offer. Dr Mittelman clarifed that there is no limit on the number of new SNPs that can be ordered. There is a limit on the number of SNPs that can be tested at one time on the lab deck and that limit is 2000. FTDNA can in theory calibrate for use as many SNPs as they can order and design but it’s a question of managing the time.

SNP validation
I asked whether it was necessary for SNPs identifed through next-generation sequencing to be validated using Sanger sequencing. Dr Mittelman advised that with high-confidence SNPs the data is very clean and validation is not necessary. Sanger sequencing might be needed for medium- and low-confidence calls where there are flags and not a lot of data. He also advised that next-generation sequencing is being used to validate the SNPs on the new Geno chip.

Poznik et al (2013) (supplementary data) did in fact validate their NGS SNPs using Sanger sequencing and found a concordance rate of 99.92% with just one discordant genotype.²

White paper
Dr Mittelman advised that once all the data has been through quality control FTDNA will then produce a white paper which will provide information on some of the technical details of the test. The paper will cover performance metrics, value proposition, etc, and they also hope to look at mutation rates, something which is of great interest to the genetic genealogy community and a subject of considerable debate and disagreement! The paper should be out in the next four to six weeks or so.

The new Y-tree
BIG Y data is currently being released using the now very out-of-date and somewhat irrelevant 2010 Y-tree. Bennett Greenspan, the Chief Executive Officer of Family Tree DNA, advised in the webinar that they have had teams of people working on the new tree in collaboration with the Genographic Project. The new tree will be fully integrated with Geno 2.0. The tree needs to be ready from both the technical point of view and the graphical interface, and it seems that it is the latter which is proving more problematic. The tree is not dependent on the release of a scientific paper. Bennett advised that it might be ready in the “next several weeks”. When the new tree is finally launched, SNPs from the BIG Y will be automatically mapped on the new tree.

Third-party tools
FTDNA want to encourage people to use third party tools to get more out of their results and to come up with new ways to analyse the data. I have previously written about YFULL, a Russian company which provides a very nice Y-chromosome interpretation service. See my review from November 2013. The service is currently free if you agree to let them have your sequence, but it is expected that they will charge a fee at some point. The Full Genomes Corporation have also indicated that they might be able to analyse BIG Y data though no announcement has yet been made. With the increasing availability of Y-chromosome sequencing data no doubt other tools and analytical services will appear in the future.

Additional questions
After the webinar had finished I realised that there were still some questions that I hadn't asked and David Mittelman kindly provided me with some answers by e-mail.

Q: Are there any plans to provide results for Y-STRs?
A: Big Y does span STRs but that was not the intent of the product. So you can go to the VCF files or the raw data and you will see insertions and deletions at STRs, however, we do not plan to add this to the web page. I would much rather recommend our established and proven STR tests.

Q: Does the BIG Y raw data also include the full mtDNA genome?
A: No, it is comprehensive sequencing of the accessible parts of the Y chromosome. We, as you know, offer full mitochondrial sequencing as a separate product.

Q: Will a list of positive SNP results be posted on the Project SNP pages?
A: Yes, if they are on the tree

Preliminary analysis of BIG Y results
The initial results from the first batch of BIG Y tests were producing an unexpectedly high number of novel variants. Vince Tilroe has analysed some of these results and reports as follows on the U106 mailing list:

It looks like many of the novel variants shared by many Big-Y testees may belong to a particular subclade below R-L20, the haplogroup to which the primary source of the anonymous male donors belongs to, whose sequences were used to build the ChrY reference assembly, and many of those may even be exclusively private to him. Greg Magoon had filtered them out from the 1KGP and FGC reports, but YFull had assigned "Y" identifiers to some of them.

I've compared novel variants from six Big-Y returns belonging to haplogroup R-L51 and below, and have so far identified 56 "novel variants" shared between at least two of them so far, but individual samples only had between 43 and 48 of those. This pretty much cuts the typical true novel variant count in half, leaving a count that is more in line to what was expected for this process.

Charles Moore, the U106 admin, has since received confirmation from another group that many of the novel variants are ancestral shared novel SNPs.

Other SNP tests
Full Genomes Corporation is the only other company which currently offers comprehensive Y-chromosome sequencing. Their test is substantially more expensive than the BIG Y but sequences more of the Y-chromosome. When the BIG Y raw data files become available it will be possible to do a comparison of the two tests. For comparisons of the available SNP tests, including the Geno 2.0 and Chromo 2 chip tests, see the SNP testing comparison chart in the ISOGG Wiki.

What are we going to do with all these SNPs?
I wrote in a previous blog post about the confusion of SNPs generated by the various SNP tests offered by the different testing companies. We now have a situation where four companies/organisations (Family Tree DNA/Genographic Project, Full Genomes, BritainsDNA/ScotlandsDNA and YFull) are maintaining their own proprietary SNP databases. There is a great need for an open access independent database of validated SNPs. ISOGG – the International Society of Genetic Genealogy – are probably in the best position to produce such a database, but they also have responsibility for maintaining the Y-SNP tree. The sheer amount of data generated from the next-generation sequencing tests will represent a significant challenge for the volunteer Y-SNP team. I do wonder if the present tree system is actually sustainable and, if in the long run, it might be better to report results as differences from the reference sequence, as is the practice for mitochondrial DNA. Whatever happens, we will have an interesting year ahead of us.

Are you interested in ordering the BIG Y or another SNP test?
My advice for anyone thinking of ordering SNP testing is to be patient and wait for a few months until all the results from the first batches of BIG Y and Full Genomes tests have been analysed and compared. Once this process has been completed we will have a better picture of the new Y-chromosome landscape and the shape of the tree, and it will then be possible to make an informed choice as to which test to purchase. Dr Mittelman advised that there are no immediate plans for another BIG Y sale. At the moment the priority is to bring down the turnaround time for new orders which is currently 8 to 10 weeks.

If you are interested in being involved make sure you join the relevant haplogroup mailing lists and Facebook groups. If you've tested at Family Tree DNA make sure you join the appropriate haplogroup or subclade project. The mailing lists and groups are usually linked from the haplogroup project websites. There is also a list of mailing lists and Facebook groups in the ISOGG Wiki:

www.isogg.org/wiki/Genetic_genealogy_mailing_lists

Further information
There is a set of BIG Y FAQs in the FTDNA Learning Center:

www.familytreedna.com/learn/y-dna-testing/big-y

The BIG Y page in the Learning Center provides screenshots and descriptions of the user interface:

www.familytreedna.com/learn/user-guide/other-test-results/big-y-page

Elise Friedman presented a webinar on 28th February on the subject of "Getting to know BIG Y Results". A recording of the webinar should eventually be made available in the webinar archive in the Learning Center:

www.familytreedna.com/learn/ftdna/webinars

Update 2nd March 2014
The recording of the BIG Y webinar is now available online and can be accessed via this link (free registration required):

https://attendee.gotowebinar.com/recording/4739415541486853122

Update 3rd March 2014
I have put the full text of the letter from Nir Leibovich, in which he apologises for the lack of communication about the expected date of release of BIG Y results, online here. Despite expectations to the contrary, it was never FTDNA's intention to deliver all the results on 28th February. That was the date when the results were expected to start rolling out. It also transpires that there is currently no way for FTDNA to change the expected date on customers' personal pages until the expected date has actually passed.

I've received a number of comments about the problem with reagents which contributed to the delay. Dr David Mittelman has contacted me to clarify the issue:

"We sequence the Y using Illumina HiSeq equipment and we ran out of reagents to do this, and for a period in December and January, Illumina had a back order in place so we could not order more. Illumina filled the orders in the second half of January and we continued our work. Back orders happen and since Illumina is the only game in town, we don’t have other vendors to go to, when Illumina runs out. Of course we are now rolling out samples continuously and each week, in batches. Just like we do for all our products and just like Full Genomes and other companies do."

He adds

"In the meantime as more batches complete I am confident people will be thrilled with the data. We were able to deliver better specs than I originally promised and... we will not ship subpar results to anyone. Everyone will get great data."

Update 4th March 2014
Dr Jim Wilson of BritainsDNA/ScotlandsDNA has now released a spreadsheet with details of the genome reference co-ordinates for all the Y-SNPs on the Chromo 2 chip. See the following blog post from CeCe Moore for further details and to download the spreadsheet:

- Dr. Jim Wilson and ScotlandsDNA Release Y-SNP Positions for Chromo2

Thomas Krahn has now uploaded the 8000 or so novel markers to Ybrowse. This will allow the genetic genealogy community to cross-check all the new tree branches discovered by Jim Wilson earlier this year. Thomas Krahn has advised that his company YSEQ can design primers for some of the new SNPs as required.

Update 1st April 2014
Although the BIG Y .vcf and .bed files do not include mitochondrial DNA data, it now transpires that mtDNA is included in the BAM files. The mtDNA data can be extracted using third-party tools. For further details see the following blog post from Roberta Estes:

http://dna-explained.com/2014/04/01/mitochondrial-dna-results-from-the-big-y-test

See also Felix Chandrakumar's blog post on the YFull interpretation service which includes a report on the mtDNA data extracted from his BIG Y BAM test:

http://www.fc.id.au/2014/03/yfull-y-chr-sequence-interpretation.html

Update 29th August 2014
Family Tree DNA have published a white paper outlining the methodology used for the test and the analysis.

Footnotes and references
1. For links and resources on next-generation sequencing see the ISOGG Wiki page: www.isogg.org/wiki/Next_generation_sequencing

2. A good description of the Y-chromosome reference sequence is provided by Poznik et al (2013) Sequencing Y chromosomes resolves discrepancy in time to common ancestor of males versus females. Science 2013 341; 6145: 562-565:

The Y-chromosome reference sequence is 59.36 Mb, but this includes a 30-Mb stretch of constitutive heterochromatin on the q arm, a 3-Mb centromere, 2.65-Mb and 330-kb telomeric pseudoautosomal regions (PAR) that recombine with the X chromosome, and eight smaller gaps.

This effectively leaves around 22.98 Mb of “assembled reference sequence”. If you can get hold of the Poznik paper it contains a very nice figure (Figure 1. Callability mask for the Y-chromosome) showing the regions of the Y-chromosome in which reliable genotype calls can be made.

On a side note, this paper has come in for a lot of criticism, not the least of which is for the authors' mistaken assumption that mitochondrial Eve and Y-chromosomal Adam should be expected to date back to the same time. For a critique of this paper and some useful related diagrams see the three-part series of articles by Melissa Wilson Ayres: Y and mtDNA are not Adam and Eve: Part 1; Y and mtDNA are not Adam and Eve: Part 2 - What it means to be the Most Recent Common Ancestor and Y and mtDNA are not Adam and Eve: Part 3 - Resolving a discrepancy.

3. Further papers of interest are listed on the Y-chromosome page in the ISOGG Wiki: http://www.isogg.org/wiki/Y_chromosome

4. The term "slow-motion tsunami" was coined by Charles Moore, the administrator of the R1b-U106 project: https://groups.yahoo.com/neo/groups/R1b1c_U106-S21/conversations/messages/21323

5. Danecek P, Auton A, Abecasis G et al (2011). The variant call format and VCFtools. Bioinformatics 27 (15): 2156-2158.

© 2014 Debbie Kennett

Pages