Friday, 29 November 2013

23andMe and the FDA

The big news in the genetic genealogy world this week is the announcement that the personal genomics company 23andMe have received a stern warning letter from the FDA in which they were told that they "must immediately discontinue marketing the PGS [Personal Genome Service] until such time as it receives FDA marketing authorization for the device". However,  I note that despite the warning letter 23andMe have not withdrawn their test from sale.

This is the 23andMe ad which has been shown on national TV in the US which probably sparked the FDA's action. It seems to me inappropriate to advertise such a product on the television and I can understand the FDA's concern.

There have been many excellent articles and blog posts covering all sides of the debate so I won't comment here but will instead refer you to the best resources for further reading.

Blaine Bettinger, who writes The Genetic Genealogist blog, has given his take on the story and provided a very useful selection of links to the most interesting commentary on the subject. His post can be found here. If you are interested in the implications of the FDA's actions it's well worth reading all these links.

The journalist David Dobbs has also been tracking the coverage of the story and he has summarised all the different viewpoints and provided an extensive selection of links in his blog post FDA muzzles 23andMe after talks break down.

If you only have time to read one article on the subject I recommend reading Michael Eisen's thoughtful post FDA vs. 23andMe: how do we want genetic testing to be regulated. Michael Eisen's views most closely align with my own thoughts on the matter.

It will be interesting to see what happens in the next couple of weeks. I'm not expecting the FDA to shut down 23andMe but it might be that some of the health reports are redacted until such time as an agreement can be reached. Nevertheless it's a good idea to ensure that you have downloaded your raw data and saved the health reports that are of particular relevance. Some of the health reports can be saved as PDF files. For other reports you will need to save screenshots. If you've tested with 23andMe for genealogy purposes you might also like to take advantage of the Family Tree DNA sale to transfer your results to the FTDNA's Family Finder database. The transfer will cost $49 until the end of the year (the usual price is $69).

You can read my series of articles on my own 23andMe test using the links on this page.

Monday, 25 November 2013

YSEQ.net - a new company offering a single SNP testing service

This article is for advanced genetic genealogists who have an understanding of SNPs.
Thomas and Astrid Krahn have launched a new small business by the name of YSEQ.net. Thomas and Astrid were formerly employed by Family Tree DNA where they developed the Walk Through the Y SNP discovery programme. The new company will cater for a niche market developing custom SNPs on demand. Thomas announced the service on the Genealogy DNA list and in the ISOGG Facebook group as follows:
Expecting a flood of new SNPs from Next Gen sequencing we try to help with cleaning up the mess by offering very traditional Sanger sequencing for any marker you desire on the Y chromosome. The testing will be performed in our own laboratory.

Check out http://yseq.net/ and http://shop.yseq.net/

We don't have very many markers yet, but you can "Wish a SNP" of your choice and we'll make it available as fast as we can. We'll not limit the number of markers to 2000 or so. If you just received your NGS results, ask us for a bulk package offer by e-mail (info@yseq.net). Let me know if you have questions.
YSEQ will focus on providing SNPs that are not available with other testing companies. SNPs will be available either singly or in panels. Family Tree DNA was previously the only other company to offer single SNP testing, but they currently do not have the capacity to provide testing for more than 2000 additional SNPs.

New SNPs can be suggested. There is a $1 fee for suggestions but this is a formal spam blocker, and it will be possible to send a long list of markers at once with a single submission.

YSEQ also plan to offer a Y-STR testing service in the near future. It is expected that this service will focus on the 300 or more STRs that are included in the next generation sequencing tests but which are not currently available to test separately from any commercial provider.

YSEQ will provide a very useful and much-needed service as we anticipate the arrival of the SNP tsunami. We are entering uncharted territory with next generation sequencing of the Y-chromosome. Custom SNP testing using the tried and tested Sanger sequencing technology will be necessary to validate the new SNPs found. Custom SNP testing will also provide a cheaper method for comparative testing of SNPs to verify their placement on the branches of the Y-tree.

I wish Thomas and Astrid Krahn every success with their new venture.

Friday, 22 November 2013

Day 2 at the Royal Society's 2013 Ancient DNA Meeting

This is my second and final report from the Royal Society's Ancient DNA Meeting. See my previous post on Day 1 at the Royal Society's 2013 Ancient DNA Meeting for the full details of the meeting. As before, the accuracy of my notes and my interpretation of the lectures is not guaranteed, but I hope that some people might find the information useful until such time as the audio recordings become available.
The room starts to fill up as delegates arrive for the start of Day 2.

Robin Allaby, University of Warwick, England 
Using archaeogenomic and computational approaches to unravel the history of local adaptation in crops
Most of the plant studies to date have been on crop evolution. New computational approaches are now being used.
Barley degrades after 350 years in North African climates.
We don’t expect to see much ancient DNA from barley after about 4000 years.
There has been a shift in emphasis  from "what". Scientists are now looking at how the crops got domesticated and how they adapted to new latitudes.
Plant exploitation by humans has been going on a for a long time before the Younger Dryas.
He showed us models which suggested that plants which have had a rapid adaptation have a lower survival rate.
Does next generation sequencing work with low latitude samples?
He shared his research on samples from the Qasr Ibrim archaeological site in Egypt. The site had been occupied for about 3000 years by five different cultures. This is a very dry site and is particularly good for preservation as there are few bacteria found in the samples. The barley still looked edible after 1000 years.
They found evidence that a new type of barley was introduced in this region during the Christian period coinciding with the Crusades.

Alice Storey, University of New England, USA
A multidisciplinary view on the domestication and dispersal of the chicken
Alice was clearly passionate about her subject and gave a very interesting and illuminating talk. I never thought I would find chickens so interesting! As she pointed out, chickens have been transported by humans and by studying chickens we can answer questions about human history. Many of the very valid points she made were just as applicable to other disciplines.
 “It is a cursed evil to any man to become as absorbed in any subject as I am in mine”. This quote has been attributed to Darwin but the source is not known. Can anyone help?
Chicken research goes back to Aristotle (384-322 BCE).
Chicken research is 20 years behind ancient DNA research into cattle and horses.
The context and provenance are important. Where something comes from and how it got there matters. “Context isn’t always what it seems”.
“The past is a palimpsest assemblage”.
A good example of the problem is the paper by Harris et al 2013 looking at chickens in Santa Cruz where they found that there had been lots of movement of chickens.
There are currently 871 chicken sequences but only 18.5% come from the wild. Many of the samples have come from zoos but they have no provenance.
People move animals around. They are portable wealth. There is a documented transfer of chickens from India to China in 1400 BC.
If you look at a modern DNA signature you are getting a mixture – an omnishambles.
Multiple DNA signatures in chickens.
Contemporary flocks are 80% foreign.
Only 17 whole mtDNA genomes.
Other samples have sequenced 500 bases pairs in control region.
The earliest accepted domestic chicken remains have been found in Northern China.
Over 30 candidate domestication genes have been identified.
Samples can be radio-carbon-dated to fix age.
There is one full chicken genome but we can’t read it properly.

Greger Larsen, Durham University, England
Testing the chronology of domestication genes using ancient DNA
This was by far the funniest and most entertaining talk of the conference. Larsen told us that Ian Barnes is much funnier than he is. However, Larsen can ride a bike and Barnes can't! Ian Barnes was not at the conference and I have not yet had the opportunity to hear him speak so I was unable to make comparisons.
Domestication genes are genes that control traits during the initial process and are typically fixed. Improvement genes are variable amongst domestic populations.
Chickens in the western world all have yellow legs. It’s always been assumed that because of the wide distribution of yellow legs they evolved early and it was thought that the trait was favoured by early farmers. Ancient DNA has now shown that this theory is wrong as the genes for yellow legs are not found in ancient DNA samples. Ancient DNA reveals a lack of fixation.
The reviewers (especially that pesky Reviewer Number 2!) had problems believing the research because it went against accepted thinking. Paper after paper has shown that a wide present-day distribution correlates with early evolution.
He had the audience in stitches by speculating on how his research might be received by the media:
Daily Mail: "Shocking waste of taxes on study that proves all 100 million UK chickens are dirty foreign birds".
Editors of high profile journals: "Chickens are first and best domesticated animal". (Extra brownie points for getting two superlatives into one paper!)
BBC: "Yeti proven to be giant chicken". 2bps of 16S perfectly matches a chicken.
 “The past is a different country”. The vast majority of variation has gone extinct. We can’t see it in modern-day populations. He showed here a slide showing the phylogenetic trees for a number of different animals. The trees were based on both modern populations and ancient DNA research. Large parts of the trees included branches found in ancient DNA that are now extinct in modern populations. One of the trees (bears?) was particularly striking as about 90% of the tree was now extinct.
Don’t make assumptions based on modern populations however obvious they might seem.
It has always been assumed that when something is fixed in a single breed this is a sign of early origins. Strong selection leads to fixation and is followed by geographical proliferation. There has been study after study in animals which seem to prove this point. Larsen’s chicken research now shows that this is not always the case. There is no link between modern ubiquity and ancient origins. Assumptions based on modern data need to be re-tested. The old papers need to be reinvestigated.
There are temporal changes in allele frequency. Bottlenecks are insane.

Comment from Alice Storey: There are good written records for chickens and with a search through the literature it might be possible to determine the date when yellow chicken legs were first reported as a result of crossing experiments. There do not appear to have been any reports of yellow legs before about 1820.

Dan Bradley, Trinity College Dublin, Ireland
Cattle and codices – aDNA in bone and parchment
There are two main types of cattle: Bos taurus and Bos indicus. Genetic data show that the two species diverged hundreds of thousands of years ago. It is thought that there were two independent domestications.
Next generation sequencing is now used for ancient DNA research in cattle. There are lots of sequence errors.
Whole genome mtDNA resolution gives greater clarity.
Ancient DNA fills out the phylogenetic history and helps with the calibration and the mutation rates. Use time-stamped variants to calibrate the tree.
[Autosomal] microsatellite genomic data also show that the two cattle species have very divergent alleles.
It’s now been shown that both species share a common ancestor.
Manuscript parchment and ancient DNA analysis.
Parchments are ubiquitous in the historical record from the 13th to the 18th century.
Parchments can be directly dated. They are robust, well preserved and valuable documents and are a good source of domestic DNA.
The ancient DNA standards suggest that we should “Do it right or not at all”. Dan Bradley says we should “Do it all or not at all”. Do it all (high-coverage next generation sequencing) is now within our reach

Comments from audience
David Reich: Have you thought of using linkage disequilibrium?

David Lambert, Griffith University, Australia
Bursting the limits of time: ancient penguin genomics
He opened with a mention of Martin Rudwick’s book Bursting the Limits of Time which has greatly influenced him.
Georges Cuvier developed the first test of evolution 60 years before Darwin.
Jean-Baptise Lamarck is the father of the idea of evolution. He came up with many of the key principles 50 years before Darwin.
The most recent common ancestor of penguins lived 20.4 MYA (million years ago) (17.0-23.8 MYA).
The current population of Adélie penguins is 10 million. They only nest in ice-free areas. There is a lack of genetic differentiation, but there is a lot of mtDNA diversity.
The microsatellites (autosomal?) in penguins get longer over time.
Millar and Lambert 2008 PLOS article: Mutation and evolutionary rates of Adélie penguins from the Antarctic.
There are 20 complete mtDNA genomes. Eight of these are from ancient DNA. There are 26 modern genomes at 18-30x coverage.
There is a low level of differentiation of colonies all around Antarctica. There are 35-40 ancient genomes at 1-4 x coverage (mtDNA and nuclear genomes).
Penguins are an isolated population. They live in Antarctica and co-exist with only two other species. Consequently there are major opportunities for population genome studies aimed at understanding evolutionary processes.
Population genomics will enable us to better understand the genomic processes that underlie evolutionary changes (eg, mutational mechanisms).

Ludovic Orlando, University of Copenhagen, Denmark
Digging out the deep evolutionary past of equids: towards really ancient genomes
There is hardly any ancient DNA for the period from 126 KYA (thousand years ago) to 781 KYA. We have 16 base pairs from a bear in Southern Spain.
He described the methodology used to date the equus DNA extracted from a find in Thistle Creek in the Yukon Territory in the South Klondike. The equus was preserved in the permafrost.
The researchers deployed single molecule sequencing using machines from a company called Helicos Biosciences. The company has since gone bankrupt.
Paper: True single molecule DNA sequencing of a Pleistocene horse bone 1.3x – 3.4x.
Ancient DNA is short and fragmented.
Paper: Improving ancient DNA read mapping against modern reference genomes.
There are 83 complete mtDNAs of modern horses available.
Paper Achilli et al PNAS 2011: Mitochondrial genomes from modern horses reveal the major haplogroups that underwent domestication.
How to detect the degree of degradation.
The authors described the methods they had used to date the horse.
This talk was highly technical and a lot of it was above my head. Perhaps others who are more knowledgeable than me will be able to provide a better summary.

Laura Parducci, Uppsala University, Sweden
Ancient Plant DNA of Nordic environments
Plant mtDNA is very different from that of animals and has a very low mutation rate.
Ecological niche modelling.

Michael Hofreiter, University of Potsdam, Germany
(previously at the University of York, England)
The future of ancient DNA
“Predictions are difficult, especially about the future.”
We will not be able to extract any dinosaur DNA.
Homo floriensis (Hobbit) DNA also seems highly unlikely.
There are now lots of genomes and many more in the making.
Sanger sequencing was used until 2005.
454 sequencing was introduced in 2005.
Illumina next generation sequencing started in 2009.
We can’t do de novo assembly of a genome with next generation sequencing. You have to map to something.
There is not just one past, but many pasts – many many time slices.

Questions from the audience
I wasn't sure if I correctly understood the question but Mark Thomas asked something along the lines that if a sequence were generated with current technology would it actually work in theory if it could be used to create a new being. The answer was no, presumably because sequences are not 100% accurate.
I asked about full Y-chromosome sequencing and whether or not it might ever be deployed in ancient DNA research. The answer was that it is the worst locus to analyse. It is difficult to analyse because of all the repetitive sequences. I would like to think that Michael Hofreiter might be wrong and that the impossible will one day be possible!

Other news from the meeting
There is a new Ancient DNA Community on Google+ for both academics and members of the public.

Bruce Winney told me that, fingers crossed, he hopes the paper on the People of the British Isles Project will be submitted in the next few weeks.

Turi King has nearly finished the analysis of Richard III’s DNA. A paper won’t be submitted until next year.

Postscript
As I was compiling this post an important new paper appeared online in Nature entitled Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. The authors have sequenced the draft genome “of an approximately 24,000-year-old individual (MA-1), from Mal’ta in south-central Siberia, to an average depth of 1×”. They claim that, to their knowledge, “this is the oldest anatomically modern human genome reported to date”.

Update
The recordings of all the lectures from this meeting are now freely available on the Royal Society's website.

See also
Day 1 at the Royal Society's 2013 Ancient DNA Meeting

© 2013 Debbie Kennett

Thursday, 21 November 2013

Day 1 at the Royal Society's 2013 Ancient DNA meeting

I spent two very interesting days this week attending the Royal Society’s meeting on Ancient DNA: the first three decades. Recorded audio of the presentations will be available on the Royal Society’s website at some point and the papers will be published in a future issue of Philosophical Transactions B. While at the meeting I made notes during the talks, and I thought that until the recordings have been uploaded to the website these notes might be of interest to those who were unable to attend the meeting. These notes are not intended to provide comprehensive coverage, and I only jotted down items that I personally found of particular interest. My primary focus is on the genealogical applications of DNA testing, and my interests will, therefore not necessarily coincide with those of other researchers. Many of the technical and scientific details of the talks were well outside my expertise. The accuracy of my notes and my interpretation of the lectures is not guaranteed, but I hope that some people might find the information useful.
The Royal Society in Carlton House Terrace, London SW1 - 
the venue for the Ancient DNA meeting.

Full details of the meeting, along with speaker biographies, can be found on the Royal Society’s website. The abstracts for these talks have not been made available on the website though they are all included in the programme which was issued to attendees.

A related satellite meeting is taking place in Buckinghamshire and finishing tomorrow. The speaker’s biographies and the abstracts are available on the website for the this meeting. I was not able to attend this event but I hope that other attendees will provide reports in due course.

Erika Hagelberg, University of Oslo, Norway
Ancient DNA: the first three decades
The first article on ancient DNA was published in 1984. It was a report of the cloning of a small piece of DNA from the skin of an extinct equid (a member of the horse family) that had been preserved in a museum.
The second important ancient DNA paper was on molecular Egyptology.
A lot of the early research centred on Allan Wilson’s lab
In the early days ancient DNA testing was done on the workbench without any protective clothing.
PCR [polymerase chain reaction – a process for amplifying DNA] was introduced in the late 1980s.
The first PCR machine was made with a kettle.
The late 1990s saw the development of standards of authenticity. Hagelberg felt that the new standards stifled research and open discussion.
The big technological advances in recent years have been in bioinformatics, contamination filters and next generation sequencing.
The early studies on ancient DNA (magnolia leaf, an insect embedded in amber) are now not considered very credible. It is also difficult to reproduce these early studies.
The first ancient DNA newsletter was published in 1992.
The limit for ancient DNA was originally thought to be 5000 years.
1 March 1990 Angel of Death newspaper article on the DNA of Mengele. This was the first use of DNA in forensics.
The 1990s also saw the DNA analysis of the remains of the Russian Imperial family. Some people disputed the results.
1994 Dinosaur DNA turned out to be human DNA
1997 Ryk Ward and Chris Stringer publish a paper in Nature in which they outline standards for ancient DNA research
2000 Cooper and Poiner letter in Science. “Do it right or not at all”
Hagelerg said that this was often interpreted as “Do it with me or not at all”.

Christine Keyser, University of Strasbourg, France
Past human populations in Eurasia
Keyser reported on an ancient DNA study of samples obtained from 150 graves in Yakutia  in Northern Siberia.
146 bodies were found. They were frozen at the time of discovery. Genetic data was obtained from 130 bodies.
Optimal ancient DNA is obtained from bone.
Smallpox found in Yakut graves – identified by histology.
Y-chromosome analysis was done using a Y-filer kit (17 Y-STRs). There were 20 different haplotypes. A strong founder effect was found with one haplotype shared by 29 males (46%). They went up to 23 STRs on these samples but found only three differences in the 29 males.
For the mtDNA analysis they tested HVR1 and the coding region. There were 44 different mtDNA haplotypes (n=130) with haplogroups C and D predominating.
IrisPlex and HirisPlex were used to determine hair and eye colour. Six SNPs used to detect eye colour. Brown hair and brown eyes.
SNP testing. N1c1 was the predominant Y-DNA subclade.
Full mtDNA genomes sequenced. D5a2a most common subclade.

Anne Stone, Arizona State University, USA
Impacts of colonisation in the Americas
Anne Stone was invited to speak at the last minute after the scheduled speaker, Ripan Malhi, had to withdraw. Malhi’s talk was to be on the subject of “The evolutionary history of Native Americans”. There is a summary of his planned talk on Science Daily in an article entitled Ancient, modern DNA tell story of first humans in the Americas.

Stone's talk focused on the impacts of colonisation in the Americas.
The initial colonisation of America took place between 18,000 and 25,000 years ago.
The post-Clovis theory of colonisation is dead.
The major part of Stone’s talk focused on the Salesia mission in Tierra del Fuego.
TB was the leading cause of death at the mission. No genetic evidence of TB found in her study.
Targeted enrichment to get full mt genome.
The genetic evidence shows that TB was already in animals in America before humans arrived.
Hershberg et al 2008 paper on the biogeography of M.tuberculosis.
The genetic testing of Native Americans depends on view of individual tribal groups.

Questions from the audience
Q What is the evidence for the pre-Clovis theory?
A The genetic evidence for pre-Clovis colonisation of America is based on signals of expansion. Human coprolite data is also pre-Clovis [coprolite = fossilised poo!].

Helena Malmström, Uppsala University, Sweden
The Neolithic transition in Scandinavia
Farming started 12,000 years ago in the Near East and 7,000 years ago in Northern Europe.
In Scandinavia hunter gatherers and farmers co-existed for a period of about 1000 years.
The hunter gatherers (Pitted Ware complex) and the farmers (Funnel Beaker complex) had different maternal lineages.
Haplogroup U was found at the highest frequency with U4 top of the list.
Autosomal SNP analysis showed that the Neolithic hunter gatherers differ from modern Europeans and were most like Sardinians and Basques.
[DK note: For background see the 2012 Nature News article by Henry Nichols Ancient Swedish farmer came from the Mediterranean and the 2009 paper by Malmström et al.] 

Carles Lalueza-Fox, Institute of Evolutionary Biology (CSIC-UPF), Spain
Neandertal paleogenomics and the El Sidrón cave
This was an excellent and sometimes humorous talk on the exciting findings from El Sidrón cave in Asturias, Spain.
Lalueza-Fox started by sharing a number of illustrations showing how our perception of Neanderthals has changed over time. We now know that they used language, and they lived in family and social groups. The final picture representing the current thinking showed a picture of a Neanderthal mother and child looking not much different from modern humans.
See also the modern reconstruction picture shared by @mjpallen on Twitter.
 Laleuza-Fox took us on a photographic tour of El Sidrón cave. A group of Neanderthal individuals were found in this cave. They had been trapped in the cave after a rock fall and their DNA provides a snapshot in time of a Neanderthal social group.
Complete mtDNA genomes were obtained.  Three different Neanderthal mtDNA haplogroups were found which Laleula-Fox has labelled A B and C. 7/12 were A. 1/12 was B and 4/12 were C. Three adult males had the same mtDNA but the three adult females had different mtDNA. This is indicative of patrilocal reproductive behaviour.
There were cut marks on all the remains – evidence of cannibalism.
Laleuza-Fox et al 2007 paper in Science. Some Neanderthals had red hair

David Reich, Harvard Medical School, USA
Insights into population history from high coverage Neandertal and Denisova genomes
[DK comment: Why do Americans spell Neandertal without an H but pronounce the word as though it does have an H. Why do Brits spell Neanderthal with an H but pronounce it as though it doesn’t have an H?]
This was the highlight of the first day’s talks. It was delivered at breathtaking speed, barely allowing us time to digest the content on the slides. I would have liked to have had a pause button so that I could stop and look at everything again in more detail.
Neanderthal gene flow is about 2%:
1.72% in Europeans
1.89% in East Asians
(Confidence intervals were provided but the slide disappeared to quickly for me to note them.)
Autosomal DNA analysis used a recombination rate of 10cM per 10 generations, 100 cMs per 100 generations. I spotted Graham Coop’s name on this slide but wasn’t sure whether Reich was citing the paper The geography of recent ancestry across Europe 
We now have Neanderthal sequences from three different locations: Croatia, Russia and the Altai Cave in the Altai Mountains in Siberia. This is the cave where Denisovan DNA was found but the latest analyses show that Neanderthals also lived there.
Archaic split 77-114 kya.
There were multiple gene flows.
In the original Denisovan study DNA was extracted from the little finger of a young girl. The samples date back more than 50,000 years. DNA has now also been extracted from a molar.
1.9 fold coverage of genome.
Denisovans are more closedly related to Neanderthals than to humans. Their mtDNA is twice as deep compared to Neanderthals than humans.
Denisovans are closely related to people from New Guinea. New Guineans have 4.6% Denisovan and in addition 2.5% Neanderthal.
2013 paper to be published on Altai Neanderthal found in same cave. Sequencing done at high resolution 52x coverage.
The archaic populations have a very low level of genetic diversity. The Altai Neanderthal are highly inbred.
Reich showed us a number of slides exploring a number of hypotheses he investigated on the relatedness of Denisovans to Neanderthals and humans. He concluded that “Denisovans harbour ancestry from an unknown archaic population unrelated to Neanderthals and modern humans”.
[DK note: This finding was anticipated by Graham Coop in his Haldane’s sieve blog post Thoughts on: The date of interbreeding between Neandertals and modern humans.]
New research has shown that Denisovan DNA is now found in East Asians. See the Cooper and Stringer 2013 paper: Paleontology. Did the Denisovans cross Wallace's Line?
Conclusion: gene flow between diverged humans was common in late Pleistocene and there were five events.

Questions from the audience
Q Does this mean humans copulated with Neanderthals? A Yes!
Q Does this mean humans fancied Neanderthals? A Yes!

Reich’s talk seemed to be the one that was attracting all the interest from the media. Ewen Callaway, the reporter from Nature, was at the conference and he has already written an article for Nature Breaking News which can be found here. There is further coverage from Michael Marshall in New Scientist.

[DK note: The abstract for this paper also mentions Neanderthal X-chromosome ancestry. I don't know if I missed the mention of the X-chromosome in this high-velocity presentation or if it was perhaps not covered. Here is the relevant extract from the abstract: "The average Neandertal ancestry on the X chromosome is about a fifth of that in the rest of the genome. It is known from studies of many species that genetic variations causing hybrid sterility concentrate on chromosome X. This is consistent with Neandertals and modern humans having been on the edge of biological incompatibility when they met and mixed.]

Johannes Krause, University of Tübingen
Ancient pathogen genomics: what we learn from historical diseases
The Black Death killed 30-50% of the population of Europe. It probably originated in China. Yersinia pestis has the biggest diversity in China.
99% of pestis genome sequenced at 30x coverage.
Yersinia pestis MRCA within last 4000 years.
There is nothing in the genome to explain the high mortality rate.

Christina Warinner, University of Oklahoma, USA
A new era in paleomicrobiology: microbiomes
If you go by the number of cells in our body we are 90% bacteria.
The bacteria in our bodies weigh around three pounds.
The bacterial genome is also known as the accessory genome.
There has been a 38-fold increase in the number of known bacteria in the last seven years.
Best estimate before NGS is 500 species of bacteria in mouth. After NGS, 19,000!
You can get lots of DNA from calculus.

[DK note: I'm afraid I was flagging at this point after a 5.15 am start to my day and only four hours' sleep. This talk was highly technical and much of it was over my head. The take-home message from the final talk was that this is an important emerging new field for the study of ancient DNA.]

Update
The recordings of all the lectures from this meeting are now freely available on the Royal Society's website.

See also
My notes from Day 2 at the Royal Society's 2013 Ancient DNA Meeting

© 2013 Debbie Kennett

Sunday, 17 November 2013

Family Tree DNA sale

The Family Tree DNA sale is now on. It is not very easy to find a list of prices on the website so I've copied down all the prices here. The sale ends on 31st December and all tests must be paid for in full by this date. The prices below are shown in US dollars. You can convert the prices into your local currency using one of the many online currency converters. I normally use the XE Currency Converter. Note that the dollar/sterling exchange rate is particularly favourable at present for those of us in the UK!

Basic tests for new customers
Y-DNA 37 markers $119 (usual price $169)
Y-DNA 67 markers $189 (usual price $268)
Y-DNA 111 markers $289 (usual price $359)

mtFull (full mitochondrial sequence) $169 (usual price $199)

Family Finder $99 (US customers also receive a free $100 Restaurant.com gift certificate)

Autosomal DNA Transfer $49 (usual price $69) - this allows people who have tested at 23andMe or AncestryDNA to transfer their autosomal results to FTDNA's Family Finder database

Combination Tests
Family Finder + Y-37 for $218 (usual price $268) 
Family Finder + Y-67 for $288 (usual price $367)
Family Finder + mtFull for $268 (usual price $298)
Y-37 + mtFull for $288 (usual price $366)
Y-67 + mtFull for $358 (usual price $457)
Comprehensive Genome (Family Finder, Y-67 and mtFull) for $457 (usual price $566)

Upgrades
Y-Refine 12 to 37 for $69 (usual price $109)
Y-Refine 12 to 67 for $148 (usual price $319)
Y-Refine 25 to 37 for $35 (usual price $59)
Y-Refine 25 to 67 for $114 (usual price $59)
Y-Refine 37 to 67 for $79 (usual price $109)
Y-Refine 37 to 111 for $188 (usual price $220)
Y-Refine 67 to 111 for $109 (usual price $129)
mtHVR1 to Mega (full mitochondrial sequence) for $149 (usual price $169)

Big Y
This is a new Y-chromosome sequence test for advanced users who are interested in SNP discovery and contributing to our scientific knowledge about the phylogeny of the Y-chromosome. There is an introductory offer on this new test, and It is currently on sale for $495. This test is only available to existing customers. The price will go up to $695 after 1st December. If you have previously taken the Walk Through the Y test you will be eligible for a $50 discount. There should be a voucher that you can use on your personal page. For further information about the Big Y see my earlier blog post on the new Big Y test from Family Tree DNA.

For information on the different types of DNA tests see the beginners' guides in the ISOGG Wiki.

The Y-chromosome sequence interpretation service from YFull.com

This article is for advanced genetic genealogists who have had their Y-chromosome sequenced or who are interested in doing so.

With the forthcoming SNP tsunami, the analysis and interpretation of the Y-chromosome results provided by the various companies will be one of the key determining factors in the success of their products. Fortunately within the genetic genealogy community we have a number of intrepid pioneers who have volunteered to serve as guinea pigs by testing at all the companies so that we will eventually be able to do comparisons between all the products. David Hollister, who runs the Hollister one-name study and is the co-administrator of the Hollister DNA Project, is one of our brave guinea pigs. He has already had his Y-chromosome sequenced with Full Genomes Corporation. He has previously tested with the Genographic Project, and has had STR testing at Family Tree DNA. David is now waiting for his results from the Chromo 2 test from BritainsDNA and the BIG Y test from Family Tree DNA. Another genetic genealogist Itaï Perez has already provided a comprehensive look at the Full Genomes Y-sequencing results in a guest post on CeCe Moore's blog so I see no point in covering the same ground. However, David has recently submitted his Full Genomes data to another service by the name of YFull.com for an alternative interpretation. David was really excited by his results and was so "blown away" by the reports he received from YFull that I asked him if he might be able to share some screenshots so that other genetic genealogists might get a feel for what to expect from this service. David has very kindly agreed and has also obtained the consent of the YFull team for me to publish these screenshots. You will need to click on each image to see larger versions of the screenshots.

This is David's home page on his YFull account. Note that according to YFull there are 41,828 known Y-SNPs and 478 short tandem repeats (Y-STRs).

This report shows David's position on the Y-haplotree and his results for all the SNPs tested on his branch of tree. Separate reports are available for "controversial" SNPs and no calls.

This report provides a list of private and unknown SNPs. 247 private and unknown SNPs were found in David's sequence: 66 were deemed to be of best quality, 10 were of acceptable quality, and 13 were of low quality. For 111 SNPs only one reading could be obtained. A temporary internal ID system is used to identify the private SNPs and they all bear the prefix YFS, an abbreviation for YFull Singleton.

This report shows results for the Indels. Indel is the term used to describe insertions and deletions - positions in the sequence where extra As, Cs, Ts and Gs have been inserted or where they are absent.

There is a handy SNP index that allows you to query your results by SNP name.

Here is the report showing results for the 478 STRs tested.

This pie chart shows the percentage of "good" and "uncertain" alleles. 90.2% of the alleles were classified as "good". Note that next generation sequencing with a read length of 100 bps does not pick up some of the longer STRs in the sequence.

YFull have recently introduced a group feature. There are currently groups available for haplogroups R1a and G2a.

YFull are based in Moscow in Russia. They are currently providing a free service for a limited period, but I understand that they will at some point start charging a small fee. They are able to use data for any Y-chromosome which has been sequenced at a minimum 25X coverage and with a read length of at least 100 base pairs. Data needs to be provided in the form of a BAM file. If you have tested with Full Genomes they will provide you with your BAM file on request. Results are not yet available from Family Tree DNA's BIG Y test but I understand that they will also make the BAM files available. It remains to be seen what level of analysis and interpretation FTDNA will provide.

We can expect the interpretation of Y-chromosome sequencing results to change over time as our knowledge improves, and as more comparative results become available. In the meantime YFull certainly provides an interesting complement to the service provided by Full Genomes. No doubt we can expect other similar services to appear on the scene in the coming months as more sequences become available.

See also
- ISOGG Y-DNA SNP testing chart
- The new Big Y test from Family Tree DNA
- A confusion of SNPs
- A simplified Y-tree and a common standard for Y-DNA haplogroup and SNP nomenclature 

© 2013 Debbie Kennett

Friday, 15 November 2013

A confusion of SNPs

This article is for experienced genetic genealogists and requires a reasonable understanding of SNPs and haplogroups.

The launch of the new Big Y test from Family Tree DNA has brought to light the difficulties in comparing the offerings of the different testing companies. We have a chart in the ISOGG Wiki which compares the various Y-SNP tests on the market but it is clear that we are not always comparing apples with apples. One of the major difficulties relates to the claims by the companies about the number of Y-SNPs on their chip. A SNP is a change or a mutation in the DNA alphabet at a single position on the Y-chromosome (eg, a C changing to a T). There are around 59 million base pairs in the Y-chromosome. However, surprising as it might be in this genomic era, there are still large sections of the Y-chromosome that have not yet been explored. Build 37, the current build of the human genome reference sequence, has only mapped out the positions of around 25 million base pairs  less than half of the Y-chromosome.The discovery of new SNPs is therefore limited to the parts of the Y-chromosome that can be sequenced using current technology. These areas represent just over 40% of the Y-chromosome. In theory, therefore, a SNP could be found on any one of the 25 million bases that can be sequenced.

The exact number of SNPs on the Y-chromosome is not yet known. There is no central resource listing all known SNPs because there is fierce competition and the companies are keen to keep knowledge of the SNPs that they have discovered from their competitors for as long as possible. We therefore have some SNPs that are in the public domain, some unpublished SNPs that are known only to Family Tree DNA/the Genographic Project, some SNPs that are known only to Full Genomes Corporation and some SNPs that are known only to BritainsDNA. To make matters worse all three companies use different naming systems for their SNPs. Full Genomes SNPs are prefixed by the letters FG, and BritainsDNA SNPs bear the prefix S.  I understand from the reports from the Family Tree DNA 2013 Conference that the Genographic Project will be publishing a paper some time in the New Year with the new 2014 Y-SNP tree. It therefore remains to be seen what naming system they will use for their SNPs. There will undoubtedly be considerable overlap in the SNPs offered by the different testing companies but until they release their data or until we have comparative results available we will not be able to work out which SNPs are equivalent (synonymous)  in other words which SNPs occur at the same position but which have been given different names by different companies. For example U106 and S21 are alternative names for a single SNP which defines one of the major branches of the R1b haplogroup.

The problem is well illustrated by the recent developments in R1b-M222, a subclade which predominates in Ireland and Scotland, and is seen in many of the surnames that are associated with the clans reputed to descend from the semi-legendary Irish historical figure Niall of the Nine Hostages.According to the early results from the Chromo 2 testing at BritainsDNA 27 new SNPs have been discovered downstream of M222.3 Yet at the Family Tree DNA Conference last weekend Miguel Vilar from the Genographic Project advised that they have identified 22 SNPS below M222. Do any of the Geno 2.0 SNPs correspond with the SNPs found by BritainsDNA? The answer is we simply do not know. Neither company releases the full raw data that will allow the participant to determine the genome reference position of the SNPs for which he has tested positive so the results from the two companies cannot be compared. Few results are in any case available at present from the Chromo 2 testing. The Genographic Project are presenting the results of their Gathering the Mayo Genes Project at a public event in Castlebar on Sunday so it may be that further information will be forthcoming then.

So where can we find out about SNPs and their position on the Y-DNA haplotree? By far the most important source is the Y-SNP tree maintained by ISOGG - the International Society of Genetic Genealogy. The tree was launched on 10th April 2006. By the end of the year there were 436 SNPs on the tree. By September 2013 there were 3610 SNPs on the ISOGG tree. According to Roberta Estes' report from Day 2 of the FTDNA conference the new 2014 Y-SNP tree, which will be published by the Genographic Project in 2014, will have 6200 SNPS and 1000 branches.This effectively doubles the size of the existing tree and will represent a significant workload for the team of volunteer project administrators who maintain the tree.

However, the ISOGG tree only documents the SNPs whose precise location on the Y-haplotree is known  in other words SNPs that define particular branches of the human family tree on the Y-line. There are thousands more known SNPs. For these SNPs we know that a mutation has been found on the Y-chromosome at the position in question but we do not know if it has any phylogenetic significance, that is, if these SNPs define branches on the Y-tree or if they are unique to the individual.

ISOGG have a SNP index that lists not just the SNPs that are on the haplotree but also those which "are or have been under active investigation and consideration for addition to the Y Haplotree." ISOGG further state that the "SNPs listed here are less than 10% of the currently known SNPs". To supplement the SNP index ISOGG member David Reynolds maintains the ISOGG SNP Compendium Spreadsheet. This was last updated about a month ago and contains a list of 47,680 SNPs which have yet to be added to the ISOGG tree and the SNP index. A small minority of these SNPs are alternative names for previously known SNPs that are already on the tree (for example, some S series SNPs correspond with some of the Z series SNPs that have already been placed on the tree). Most of the rest are SNPs whose position on the Y-chromosome is known but where we do not as yet know where they belong on the Y-tree. David Reynolds reported back in September that he had about another 5000 SNPs to process. He is "curating and combining duplicates" as he goes along so it is a time-consuming process.

There are no doubt many more SNPs that are being published in scientific papers and I don't know if anyone in the genetic genealogy community is currently keeping track of these. In one recent paper uploaded to the ArXiv preprint server two Chinese researchers discovered 25,000 new phylogenetically relevant SNPs.5

Let's now have a look at the offerings of the various testing companies in the light of these numbers. I'm discussing the companies in chronological order based on the dates when their tests were launched. Some companies offer chip-based SNP tests. These tests can only test for previously known SNPs, but the companies can customise the chips to include their own proprietary SNPs for investigation. The new gold standard tests are those which use next-generation sequencing technology. These have the potential to discover thousands of new SNPs.

The Geno 2.0 test from the Genographic Project
The Geno 2.0 test from the Genographic Project was launched in July 2012 and was the first chip test to come on the market with a comprehensive panel of Y-SNPs. The Genographic Consortium published a paper earlier this year with all the technical details of their new GenoChip.6  The supplementary data tell us that the Genographic Project started with "a raw SNP candidate database of approximately 27,500 SNPs" though some of these were duplicates. The original target was to produce a chip with 15,000 SNPs but according to the paper the chip includes around 12,000 SNPs. Customers can download a CSV file with a list of the SNPs. There were 12,059 SNPs in the most recent file that I downloaded for one of my project members. The Genographic Project do not currently provide the genome reference positions of the SNPs on their chip, and it seems likely that this information is being withheld pending publication of the 2014 tree.

The Chromo 2 test from BritainsDNA/ScotlandsDNA
The Chromo 2 test from BritainsDNA/ScotlandsDNA was launched in June 2013. It uses a customised Illumina chip which is advertised as "covering over 15,000 Y chromosome markers, carefully selected to be most informative, and as free from duplication as possible". Only a limited number of results have been released from this test so far, but a flood of results is expected in the next couple of weeks. Customers receive an Excel spreadsheet with a list of all the markers that have been tested. In the one spreadsheet that I've seen there was a list of 14,184 SNPs. Of these, 8,682 SNPs had the S prefix. On the current ISOGG 2013 Y-SNP index the S series SNPs stop at S530. In the list of SNPs that I saw there were 8385 S series SNPs with numbers higher than S530. Many of these SNPs will probably define new branches on the Y-tree but many more could simply be alternative names for currently known SNPs. We do know that the BritainsDNA chip includes SNPs found in the Genomes of the Netherlands Project, and also many SNPs that are likely to be informative for people of British descent. However, BritainsDNA, in common with the Genographic Project, do not publish the genome reference positions of their SNPs. Unless they provide ISOGG with the positions of their SNPs we will have no way of knowing where they fit on the tree and which of their SNPs correspond with those identified by other testing companies.

Full Genomes
Full Genomes is a new start-up company which made a quiet entry onto the market some time towards the end of 2012. They only began advertising their services publicly towards the end of March 2013.7 They currently offer the most comprehensive Y-DNA test on the market covering about 20 to 25 million base pairs representing around 42% of the Y-chromosome. Full Genomes claim to cover 47,000 of the known SNPs on the ISOGG tree and in the ISOGG SNP Compendium. This is after removing "ambiguous results, and synonyms from consideration".8 Around 14 million of the SNPs are reported to be within mappable regions. However, their test is also uncovering many new private SNPs which have not as yet been made public, and the number of new SNPs discovered can be expected to rise as more and more people get tested. At present each testee in one of the common haplogroups can probably expect to find between 25 and 40 private high-quality SNPs. Full Genomes make the raw data available in a BAM file so that customers will have access to the genome reference numbers and can check the ISOGG tree for alternative SNP names as and when new SNPs are placed on the tree.

The Big Y test from Family TreeDNA
The new Big Y test from Family Tree DNA was launched at the weekend at Family Tree DNA's Conference. I've provided preliminary details in a previous blog post. As this is a new test, no results are yet available, and proper comparisons with the other available tests cannot be done. The FTDNA FAQs tell us that the test covers "at least 10 million base-pairs of reliably mapped positions of non-recombining Y-Chromosome", though the exact number of base pairs sequenced has not been disclosed. One conference attendee who spoke to the FTDNA staff was told that "the number of bp [base pairs] analysed will be at least 10 million, but could in some samples go up to 12 million".9 FTDNA claim that their test provides more coverage "than any Y-DNA test on the market".  However, the test is clearly not quite so comprehensive as the Full Genomes test but it does have the virtue of being considerably cheaper which will make testing multiple people within a single subclade a feasible proposition. Confusingly FTDNA claim that the test will cover "nearly 25,000 known SNPs placing you deep on the haplotree". I can only think they've taken their figure of 25,000 known SNPs from the research into the Geno 2.0 chip and that they are seemingly unaware of the ISOGG SNP Compendium Index which, as discussed above, lists over 47,000 SNPs. If they are covering over 10 million SNPs then they will surely test most of the SNPs in the Compendium. Fortunately FTDNA have confirmed that they will make the raw data in the form of BAM files available to their customers so we will eventually be able to make comparisons.

Is next generation sequencing SNP testing for you?
Next generation sequencing is clearly becoming the gold standard for SNP testing. The Genographic Project have announced that they will be introducing a new test within the next seven to 12 months and I would imagine that their new test will use next generation sequencing. No doubt a rival new NGS test is in the works from BritainsDNA too.

The new next generation sequencing Y-SNP tests do have the potential in the long run to be genealogically relevant. There is supposedly a new SNP roughly every one and a half generations. In other words, if there's no SNP found in a son then there will more than likely be a SNP in the grandson. One day the SNPs will effectively allow us to draw complete trees for Y-lines within a genealogical a timeframe. As with any DNA test, a full Y-chromosome SNP test is only useful if you can compare your results with large numbers of other people so that we can work out the chronological order of the more recent SNPs and establish which ones are unique to specific lineages. With the Full Genomes test people in the common haplogroups are reportedly getting between 25 and 40 private SNPs. I imagine the numbers will be pretty similar for the Big Y test from FTDNA.

The numbers of people taking these tests are still relatively small  probably in the hundreds rather than the thousands. Even at $495 a time large-scale testing within a surname project is not going to be a practical proposition. However, if low-hanging SNPs are found that are specific to particular surname lineages then, if these SNPs are added to the a la carte menu, people could test for these single SNPs at $39 a time. STR markers can be used in combination with SNPs to predict who will be positive for which SNP, but ideally you need to be tested to at least 67 markers to make a confident prediction.

The potential problem is that FTDNA are only likely to want to invest money developing single SNPs if there are a reasonable number of people who would be willing to pay for such a test. The more recent the SNPs the fewer people will share them and consequently there will be less chance of the custom SNP tests being developed. FTDNA also only currently have the capacity to offer an additional 2000 custom SNPs. However, they have indicated that they will be re-introducing some form of static deep clade test, probably in the first quarter of 2014, which will be at a much more affordable price. SNPs found in the first phase of the Big Y testing will be candidates for inclusion on these chips so there is possibly some incentive for selected representatives of the various subclades to be tested to ensure that the key new SNPs are included in these tests. Full Genomes have also indicated that they hope to offer single SNPs, and a more economically priced SNP test, but it remains to be seen what they will offer. At the current prices NGS full Y testing is really only for people who wish to contribute to our scientific knowledge and to help delineate all the branches on the Y-tree. No doubt the costs will come down in time. Perhaps in five years or ten years the full Y test will be the norm but we're not there yet.

If you are interested in SNP testing the choice of testing company will be down to the individual and will depend on your budget and your objectives. The ISOGG SNP Testing Chart in the ISOGG Wiki provides a comparison between all the testing companies and is updated as new information becomes available. There will inevitably be new products coming onto the market in the next year with each new test appearing to have a slight advantage over its competitors until the next big thing comes along. I strongly recommend that you join the relevant haplogroup project. The group administrators are all very knowledgeable and will be able to offer good advice. There is a list of Y-DNA haplogroups in the ISOGG Wiki. Most of the projects have associated mailing lists which are currently buzzing with activity, and these will often be the best source of information and commentary.

The SNP tsunami 

The large number of SNPs that will be generated in what has been described as the SNP tsunami will represent a significant challenge for the haplogroup project admins and the citizen scientists who are trying to interpret these data. The new 2014 SNP tree from the Genographic Project, with a mere 6000 or so SNPs, will be something of an irrelevance, and by the time it is published it will be massively out of date, though it will at least lay the foundations for a new nomenclature. The volunteers who maintain the ISOGG tree will have their work cut out to keep up with the new developments. One of the team, David Dowell, has already commented: "It is clear that our processes need to be reorganized and streamlined if we are going to be able to continue to serve the genetic genealogy community and researchers in related disciplines in a timely basis."10

It seems likely that the current confusion will prevail for several months. As one poster on the U106 list has commented, the now infamous quote by Donald Rumsfeld is a very good summary of the current SNP situation:

"There are known knowns; there are things we know that we know.
 There are known unknowns; that is to say, there are things that we now know we  don't know.
 But there are also unknown unknowns – there are things we do not know we don't  know."11
There will be confusion, there will be chaos and there will be competition in the coming months, but from this confusion, chaos and competition many important new discoveries will emerge. I predict that as far as Y-chromosome research is concerned 2014 will be the Year of the SNP.

Updates
Vince Tilroe advises in a comment on Roberta Estes' blog that the 1.5 Y-SNPs per generation was based on the hypothetical presumption that "the entire 60 megabases [60 million bases] of the Y-chromosome could be sequenced. This is not the case by any means, and consequently a more realistic expectation should be closer to 1 Y-SNP per every 4 to 6 generations". Preliminary results from the Full Genomes testing suggest that there is around one Y-SNP every 3 to 4 generations.

Jim Wilson, the Chief Scientist from BritainsDNA, has provided a list of equivalent SNP names for some of the SNPs on the Chromo 2 chip. He has also advised that in due course he will be sharing the genome co-ordinates to allow comparisons with comprehensive Y-chromosome sequences. See CeCe Moore's blog post A list of alternate names for the Y-SNPs from BritainsDNA's Chromo 2 test for further details.

See also
A simplified Y-tree and a common standard for Y-DNA haplogroup and SNP nomenclature
- The Y-chromosome sequence interpretation service from YFull
- YSEQ.net - a new company offering a single SNP testing service

References and notes
1. For further information see the ISOGG Wiki article on the Y-chromosome:  www.isogg.org/wiki/Y_chromosome
2. Moore LT, McEvoy B, Cape E et al. A Y-chromosome signature of hegemony in Gaelic Ireland. American Journal of Human Genetics 2006 78(2): 334–338. Note, however, that this study only used 59 low-resolution STR haplotypes, and many people disagree with the conclusions, both in age and origins.
3. Paterson A. Message posted on the DNA R1b1c7 list. 25 October 2013.
4. Estes R. 2013 Family Tree DNA Conference Day 2DNAeXplained blog, 12 November 2013.
5. Wang C-C, Li H. Discovery of phylogenetic relevant Y-chromosome variants in 1000 Genomes Project data. ArXiv preprint server. Submitted 24 October 2013.
6. Elhaik E, Greenspan E, Staats S et alThe GenoChip: a new tool for genetic anthropologyGenome Biology and Evolution 2013; 5(5): 1021-31.
7. See the thread entitled Full Y chromosome sequencing: Phase III Pilot on the Anthrogenica Forum.
8. Magoon G. Message posted in the R1b-U06 mailing list, 11 November 2013.
9. See the comment thread in the private ISOGG Facebook group at https://www.facebook.com/groups/isogg/permalink/10152015234637922/.
10. Dowell D. ISOGG group gears up for SNP tsunami. Dr D Digs Up Ancestors blog, 13 November 2013.
11. For the background to the quote see the entry for Donald Rumsfeld at Wikiquote: https://en.wikiquote.org/wiki/Donald_Rumsfeld.

© 2013 Debbie Kennett

Tuesday, 12 November 2013

Genealogy in the sunshine

I am very pleased to confirm that I will be one of the speakers at the forthcoming Genealogy in the Sunshine conference which takes place in March 2104 in Rocha Brava on the Algarve in Portugal. The other speakers are:

- Chris Paton, who writes the popular British Genes blog, but is also the author of several books and a well known speaker

- Else Churchill, the genealogist at the Society of Genealogists

- John Hanson, a fellow member of the Guild of One-Name Studies who is now the Research Director of the Halsted Trust, a charitable organisation which promotes one-name study research but with particular reference to the surname Halsted 

- Donald Davis, who has done some "ground-breaking research into the householder schedules for the 1841 census". He will be explaining "how his discovery can help us interpret the entries that we seen in the enumerators' books".

I feel very honoured to be included in such an ''all-star" line-up.

The full programme has yet to be worked out but there are only a very limited number of places so if you want to participate you need to get your booking in early. Because of the restricted number of places there will be the opportunity for one-to-one consultations so bring your computer and I can have a look at your DNA results and help you to understand them!

Further information about Genealogy in the Sunshine can be found in the current edition of the Lost Cousins newsletter.

It should be a good event. The only other time I've ever been to Portugal was back in 1990 when I had a lovely holiday in the quiet resort of Praia da Luz, which is now better known for the disappearance of Madeleine McCann. It will be interesting to see how much Portugal has changed in the intervening years.

The Family Tree DNA 2013 conference

Family Tree DNA's 9th Genetic Genealogy Conference for FTDNA Group Administrators took place in Houston, Texas, from 9th to 10th November 2013. The full conference schedule can be found here. One of these days I would love to go to Houston to attend an FTDNA conference for myself but for now I have to follow the news on Twitter and hope that those who have attended the conference will share their experiences during and after the event. Fortunately this year we've had some wonderful coverage from the genetic genealogy blogging community and their reports are so thorough that it's almost as good as being there. I've provided the links below for all the reports that I've seen so far. I will update the list with any new content that is discovered. Do let me know if there's anything that I've missed.

Jennifer Zinck, who writes the Ancestor Central blog, has written three excellent and very detailed reports:

- Get your DNA test here! A tour of the Family Tree DNA lab

9th International Conference on Genetic Genealogy Day 1

9th International Conference on Genetic Genealogy - Day 2

Roberta Estes who writes the DNAeXplained blog has written a series of blog posts reporting from the conference and also a pre-conference preview:

2013 Family Tree DNA conference Day 1

2013 Family Tree DNA conference Day 2

9th International Conference on Genetic Genealogy

9th Annual Conference Reception

10 year pioneers recognized by Family Tree DNA

- Gene by Gene Genomics Research Center Lab Tour

Emily Aulicino, who writes the Genealem blog, has provide two reports from the conference:

2013 Family Tree DNA International Conference Day 1

- 2013 Family Tree DNA International Conference Day 2

Dave Dowell, who writes the Dr D Digs Up Ancestors blog, has provided the following two posts on the conference:

- FTDNA Conference Part 1

- ISOGG gears up for SNP tsunami

David Mittelman, Family Tree DNA's Chief Scientific Officer, has provided a very nice Storify version of the tweets and images emanating from the FTDNA conference:

Family Tree DNA 2013 Conference Storify.

Finally, you might also want to check out my blog post on The new Big Y test from Family Tree DNA. This has attracted a remarkable amount of worldwide attention in the last few days and is already the third most viewed post that I've ever written!

This page was last updated on 14 November 2013.

© 2013 Debbie Kennett

Saturday, 9 November 2013

The new Big Y Test from Family Tree DNA

Family Tree DNA have just announced at their conference the introduction of a new Y-chromosome DNA test to be known as the Big Y. The new test uses next generation sequencing technology which is much more reliable for the Y-chromosome than the chip testing used for the Geno 2.0 test from the Genographic Project and the Chromo 2 test from BritainsDNA. The Big Y is intended as a replacement for the Walk through the Y test which used the slower and much more expensive Sanger sequencing technology.

The Big Y test covers 10 million base pairs. It will provide results for almost 25,000 of the known Y-SNPs. However the exciting part is that this test can also be used for SNP discovery, opening up the tantalising possibility of finding SNPs that will prove to be unique to a particular surname lineage or an individual branch of a family tree.

The introductory price of the new test is $499 (£311). The selling price from 1st December will be $695 (£434). The test currently only seems to be available to existing customers, but it may that the FTDNA home page hasn't yet been updated. There is now a new splash page on the personal pages of male FTDNA customers.
People who have ordered the old Walk through the Y test will receive a voucher giving them an additional $50 off the cost of the new test. The new voucher (coupon) will be available on personal pages from Monday.

No doubt further information will be forthcoming in due course from the people who have attended the conference. It has been possible to pick up quite a bit of the news from the conference tweets (search on Twitter using the hashtag #FTDNA2013).

Details of all the other currently available SNP tests can be found on the ISOGG Y-SNP testing comparison chart. This will now need to be updated to include details of the new Big Y test.

In other news from the conference it would appear that the Genographic Project will be introducing a new Geno test within the next seven to 12 months. It may be that the next Geno test will also use next generation sequencing technology.

Updates
A set of FAQs on the Big Y test are now available on the FTDNA website at: https://www.familytreedna.com/learn/y-dna-testing/big-y/

The ISOGG Y-SNP testing comparison chart in the ISOGG Wiki has now been updated to incorporate details of the new Big Y test.

Roberta Estes has advised on her blog that results from the Big Y will be delivered in 10-12 weeks and the results will be accompanied by comparison tools. See her blog post here.

Jennifer Zinck has provided a very nice detailed summary of the first day of the FTDNA conference with further information on the talk on next generation sequencing by David Mittelman, FTDNA's new Chief Scientific Officer, in which the new Big Y test was announced. Click here to read Jennifer's blog post.

CeCe Moore has blogged briefly about the Big Y test. Most importantly she has received confirmation from David Mittelman of  Family Tree DNA that Big Y customers will be able to download their raw data files on request.

See my blog post on The Family Tree DNA Conference 2013 for a compilation of all the blog posts and other coverage from the conference.

Related blog posts
- The Big Y roll out - the SNP tsunami is on its way!
- A confusion of SNPs
- The Y-chromosome interpretation service from YFull.com

Friday, 8 November 2013

A simplified Y-tree and a common standard for Y-DNA haplogroup and SNP nomenclature

This article is for experienced genetic genealogists and requires an understanding of SNPs and haplogroups.

A very useful online resource for Y-chromosome researchers in the form of a simplified version of the Y-chromosome SNP tree has come online this week. The new pared-down version of the Y-tree is introduced in a paper by Mannis Van Oven, Anneleen Van Geystelen, Manfred Kayser, Ronny Decorte and Maarten H D Larmuseau entitled Seeing the Wood for the Trees: A Minimal Reference Phylogeny for the Human Y Chromosome. The paper has been accepted for publication in the scientific journal Human Mutation but has yet to go through the full editorial process. Mannis Van Oven's name is already well known to mitochondrial DNA researchers because he maintains the Phylotree website which hosts the definitive mtDNA tree. The simplified Y-tree is conveniently being maintained on the same website and can be found at www.phylotree.org/Y

The new Phylotree version of the Y-tree will serve as a complement to the full Y-SNP tree which is maintained by ISOGG (the International Society of Genetic Genealogy). The Y-tree is now a very complicated structure and is set to become even more detailed in the coming months with the flood of new Y-SNPs that are being discovered from academic projects and through commercial testing with Full Genomes Corp, the Genographic Project (Geno 2.0) and BritainsDNA/ScotlandsDNA (Chromo 2). There will always be a need to have the fine detail of the full high-resolution tree, especially when one is trying to drill right down to the low-hanging branches. However, sometimes it's useful to get an overview of the structure of the tree as a whole without the complication of all the addition sub-branches, twigs and twiglets, and this is something that the new Phylotree Y-tree does very nicely.

I'm very pleased to see that the paper acknowledges the contributions made by the many "independent researchers" within the genetic genealogy community. The resources that the authors used to compile their reference phylogeny included "a large number of websites maintained by independent researchers", all of whom are named in the acknowledgements.

An important innovation in this paper is a very welcome attempt to introduce a much-needed common standard for Y-SNP and Y-haplogroup nomenclature. As the authors explain "Due to multiple independent discovery events, a considerable number of Y-SNPs are known by multiple names". This diversity of names is a source of considerable confusion for both academic researchers and genetic genealogists. For example, haplogroup R1b1a2, the predominant European haplogroup, has two major branches. The markers that define these branches are known as P312 and U106 at the Genographic Project and Family Tree DNA but have the alternative names S116 and S21 at BritainsDNA/ScotlandsDNA. All four of these marker names appear in the scientific literature but the scientists often don't provide the alternative names. ISOGG provides a Y-SNP index which allows the researcher to check for other SNP names but not every researcher will know of this resource. The solution proposed by Van Oven et al is to decide on "one default name depending on which of the aliases is most frequently used in the literature", and these are the names which appear in the Phylotree Y-tree, though the alternative names are given in the accompanying spreadsheet.

It does of course remain to be seen if the scientists and testing companies will adopt the recommended nomenclature for the 417 SNPs included on the simplified Y-tree, but we can certainly hope that they will do so. Most of the names are already in use at Family Tree DNA and within the various FTDNA haplogroup projects. The one SNP name on the tree which will probably cause the most difficulties is R-M529, which is currently better known as L21 and sometimes S145. The name M529 seems to have been chosen because it was cited in an academic paper published in 2011 by Myres et al.1 However, the name L21 is now so ingrained in the collective genetic genealogy consciousness that I suspect that the proposed new name will probably not catch on. BritainsDNA have always used their own proprietary S series naming system but I hope that they will at least consider adopting the new nomenclature for the core SNPs included on the Phylotree Y-tree so that we can all speak a common language.

In the coming months we can expect an explosion of new Y-SNPs now that the first results have started to come in for the Chromo 2 test from BritainsDNA/Scotlands DNA and from the full Y-chromosome sequencing tests at Full Genomes. However, the nomenclature will continue to be a big problem as each company tries to maintain a competitive advantage. Full Genomes have already indicated that they will be offering custom single SNPs for sale to compete with FTDNA. We can probably expect to see a flood of FG SNPs being made available in the next few months. The positions of the new FG SNPs on the tree are not yet known so no other companies will be able to offer these new SNPs. So far I've only seen one data file from the BritainsDNA Chromo 2 test. This file contains over 14,000 Y-SNPs, of which around 8000 or more are proprietary S series SNPs, only a tiny percentage of which are listed in the ISOGG Y-SNP index. It may be that many of the BritainsDNA SNPs will turn out to be equivalent to the SNPs that are already on the ISOGG tree or included on the Geno 2.0 chip, and these SNPs will almost certainly be included in the Full Genomes test. However, neither BritainsDNA nor the Genographic Project provide the genome reference positions for the SNPs on their chips so there is currently no way of knowing which S series SNPs are already known about and which ones are new.  Fortunately there are many pioneers with large pockets in the genetic genealogy community who can afford to have their DNA tested at Full Genomes, BritainsDNA and the Genographic Project. With data available for comparison from two or more companies it should then be possible for the volunteer haplogroup project administrators to compare the results and establish the positions of any newly discovered SNPs on the Y-tree.

The other unknown is whether or not Family Tree DNA will be responding to the competition from Full Genomes and BritainsDNA. Their group administrators' conference is taking place this weekend in Houston, Texas, and the conference schedule has now been made available online. Miguel Vilar from the Genographic Project will be providing a Geno 2.0 update and talking about the Y-2014 tree, and Michael Hammer will be talking about the "implications of the 2014 Y-tree". FTDNA usually make a big announcement at the conference and the speculation is that they will perhaps be announcing the launch of a new Geno chip and/or the introduction of a full Y-chromosome test. Spencer Wells has already indicated that a new Geno chip might be on the way as early as 2014.2

Unfortunately, all three currently available Y-SNP tests are very expensive and well beyond the means of the average genetic genealogist. I'm rather hoping that at some point one of the companies will introduce a cheaper Y-SNP test that will allow a customer to have a refined haplogroup designation sufficient to rule out false positive matches but without breaking the bank.

For the moment I would advise anyone considering ordering a Y-SNP test to wait and see what the results are from the tests taken by the early adopters. If you want to join the pioneers and experiment with one of the new SNP tests then you can see a chart comparing the services offered by the main testing companies in the ISOGG Wiki.

With so many exciting new developments I wonder what the Y-chromosome tree will look like in 2014. The ISOGG SNP Index lists all the SNPs that are either on the Y-tree or which are under investigation, but these SNPs represent less than 10% of the known Y-SNPs. David Reynolds maintains the ISOGG Y-SNP Compendium Spreadsheet which currently contains almost 40,000 additional Y-SNPs, and has indicated that he still has over 12,000 SNPs to add, time permitting. The SNPs in this spreadsheet have not all been validated and many are not available for testing at any commercial company. It may well be that the tree will increase in size ten-fold or more in the next twelve months which will represent a significant challenge for the volunteer ISOGG Y-SNP team who maintain the tree in their own free time.

Chris Tyler-Smith cautioned us in February at a special ISOGG presentation at the Sanger Institute in Cambridge that the Y-tree nomenclature system was set to break down in 2013, and indeed that already seems to be the case. He raised the possibility of using an ancestral reference sequence for the Y-chromosome along the lines of the RSRS (Reconstructed Sapiens Reference Sequence) introduced for mitochondrial DNA in 2012.3 I wonder if that is something that we will see implemented in 2014.

Whatever the future has in store it is certainly a very exciting time for Y-chromosome researchers and, as Chris Tyler-Smith commented in February, there will be "more opportunities than ever for computer-literate citizen scientists".

References
1. Myres NM, Rootsi S, Lin AA et al. A major Y-chromosome haplogroup R1b Holocene era founder effect in Central and Western EuropeEuropean Journal of Human Genetics 2011; 19 (1); 95-101.
2. Petrone J. National Geographic considering move to new SNP chip for Genographic Project. GenomeWeb, 13 August 2013.
3. Behar DM, Van Oven M, Rosset S et al. A "Copernican" reassessment of the human mitochondrial DNA tree from its root. American Journal of Human Genetics 2012; 90 (5): 936. 

Resources
The ISOGG Y-DNA SNP testing comparison chart
A list of Y-DNA haplogroup projects
BritainsDNA haplogroup nicknames

See also
- A confusion of SNPs

© 2013 Debbie Kennett