Thursday 26 September 2013

The pick of the papers from ASHG 2013

The annual meeting of the American Society of Human Genetics takes place in Boston, Massachusetts, from 22nd to 26th October. The abstracts and posters are now up on the ASHG website and can be searched here. I have provided links below to the abstracts which were of particular interest to me as a genetic genealogist. The numbers before the titles are the abstract or poster numbers.


109. Harnessing Web 2.0 Social Networks for Genetic Epidemiology Studies with Millions of People.

112. Whole-genome sequence based association studies of complex traits: the UK10K project.

181. Patterns of IBD sharing inferred from whole genome sequences of 962 European Americans.

182. Reconstructing the Genetic Demography of the United States.

188. Fast and accurate pedigree-based imputation from sequenced data in a founder population.

192. Haplotype phasing across the full spectrum of relatedness.

268. Simultaneous estimation of population size changes and splits times from population level resequencing studies.

269. Inferring complex demographies from PSMC coalescent rate estimates: African substructure and the Out-of-Africa event.

270. Out of Africa, which way?

271. Insights into the genetic architecture of African genomes: the African Genome Variation Project.

274. Reconstructing the Population Genetic History of the Caribbean.

340. The Ashkenazi Jewish Genome.

341. Rare variant sharing reveals population histories.

347. Inferring ancient demography using whole-genome sequences from multiple individuals.

348.Inferring human population history and gene flow from multiple genome sequences.

349. A haplotype map derived from whole genome low-coverage sequencing of over 25,000 individuals.

350. Statistical estimation of haplotype sharing from unphased genotype data.

351. HapFABIA: Identification of very short segments of identity by descent (IBD) via biclustering.

352. A new method for genotype calling and phasing for the 1000 Genomes Project leads to improved downstream imputation accuracy.

404. Identification of Y chromosomes associated with risk for prostate cancer.

ASHG Posters

1911T. The visualization of probabilistic results from consumer genetic testing for ethnicity at AncestryDNA.

1861F. Computationally-efficient long-range phasing with very large datasets.
A poster from the AncestryDNA team.

1974T. Surveying European and West African Population Structure Using >2,300 Samples with Spatial Information.
A poster from the AncestryDNA team.

1944T. Geographic Population Structure (GPS) of worldwide human populations infers biogeographical origin down to home village.
A study using the Genograpic Project's Geno 2.0 GenoChip.

1958W. Y chromosomes in surname samples: insights into surname frequency and origin.
A study of 50 different Catalan surnames.

1963F. Evidence of social marginalisation leading to strong genetic differentiation among the Ari of Ethiopia.
A cautionary tale about the use of the clustering program ADMIXTURE.

1964W. Pinpointing the Indian origin and revealing the Caucasus chapter in the genetic ancestry of the European Roma.

2038F. The Iranian Genomes Project.

2043T. Resequencing of Australian Aboriginal mtDNA and Y chromosomes.
The authors report the finding of around 3000 new Y-SNPs.

2046T. Juxtapositions of short IBD blocks can cause biased estimation in inferences based on the length of IBD blocks.

2059F. Synthesizing genetic and genealogical data to trace historical waves of European and African immigration to the United States.
A poster from the AncestryDNA team.

2061T. The Saudi Arabian Genome Reveals a Two Step Out-of-Africa Migration.

2062F. The CARTaGENE Genomics Project : Population structure, local ancestry contributions and relatedness analysis of the French Canadian founder population.

2063W. Reconstruction of Ancestral Human Genomes from Genome-Wide DNA Matches.
A poster from the AncestryDNA team.

2100T. Mitochondrial Genome Database for Saudi Community.

2429W. Utility of the X chromosome pattern of inheritance: the identification of close relatives through direct-to-consumer (DTC) genetic testing.
A poster from ISOGG member Kathy Johnston.

2433W. Genetic privacy in the European Union - exploring the impact of the proposed Data Protection Regulations.

We can look forward to the results of some exciting research in the next year and an explosion of new autosomal DNA datasets. It is, however, somewhat disappointing to see that there is very little research that is now focused on the Y-chromosome and mitochondrial DNA. While the autosomal DNA studies are sequencing genomes at very high resolution the Y-DNA studies are still mostly using a small number of STR markers.

Tuesday 17 September 2013

My updated ethnicity results from AncestryDNA - a British perspective

AncestryDNA announced last week that they were starting to roll out a free update to their ethnicity results. I noticed today that my updated results were now available. The beta version of AncestryDNA's ethnicity results was widely criticised. Many American customers found that they had much higher percentages of Scandinavian ancestry than expected. As one of the few British customers in the AncestryDNA database I was surprised to find that many of my American friends and genetic cousins had significantly higher percentages of "British" ancestry than me. AncestryDNA also failed to provide any background information on the reference populations used, thus rendering the results essentially meaningless. The new ethnicity results are a slight improvement but, as with all these admixture analyses, still have a long way to go before they can provide any useful information.

When you sign into your Ancestry account you are first of all presented with your old ethnicity results. If you have access to the new ethnicity results you will see a big orange label to click on. As can be seen, my original results from AncestryDNA were 58% Central European, 28% British Isles, 13% European and 4% uncertain.
According to my family history research all my documented ancestors as far back as I can trace them are from the British Isles and predominantly from England. I know the names and birth places of 15 of my 16 great-great-grandparents and they are all English. In this generation I have one illegitimate line which has prevented me from finding out the name of the remaining ancestor. The birthplaces of these 15 great-great-grandparents are: Burrington, Devon; Bristol (2); Thornbury, Gloucestershire; Clapham, London; Colchester, Essex; Sandon, Hertfordshire; Limehouse, London; Bermondsey, London; Merriott, Somerset; Sydenham, Kent; Sydmonton, Hampshire; Kintbury, Berkshire; Westminster, London; Sherston, Wiltshire.

I know the names of 27 of my 32 great-great-great-grandparents, but I only know the birth places of 21 of these ancestors. All of my known ancestors in this generation are again from the British Isles. These are the birth places where known: Ashreigney, Devon; Mariansleigh, Devon; Thornbury, Gloucestershire; Bristol; Great Yeldham, Essex; Preston, Hertfordshire; Sandon, Hertfordshire; Scotland (place not known); Hackney, London; Laverstoke, Hampshire; County Kerry, Ireland; Merriott, Somerset; Rickmansworth, Hertfordshire; Shoreditch, London; Ecchinswell, Hampshire; Welford, Berkshire; Kintbury, Berkshire; Salford, Bedfordshire; Holborn, London; Leighterton, Gloucestershire; Purton, Wiltshire.

The new Ethnicity Estimate 2.0 from AncestryDNA divides the population clusters into 26 global regions. Europe is subdivided into the following regions: Great Britain, Ireland, West Europe, Iberian Peninsula, Finnish/Northern Russia, Italy/Greece, Scandinavia, Europe East and European Jewish. My updated ethnicity percentages from AncestryDNA can be seen below. The percentages are as follows: Europe West 47%, Great Britain 21%, Ireland 20%, Iberian Peninsula 8%, Finnish/Northern Russia 2%, Italy/Greece <1%, Scandinavia <1%.
Ancestry provide somewhat contradictory information on the number of SNPs used for the ethnicity inferences. In their introductory help pages they state that they have increased the number of comparison points (markers) used to determine ethnicity from 30,000 to 300,000. Elsewhere they tell us that they are using "100,000 highly informative SNPs". Your DNA is now analysed more than 40 times to come up with the best estimate and a personalised range. The screenshot below shows the range of results for my "Great Britain" admixture which varied from a low of 0% to a high of 49% in the 40 runs through my DNA. The midpoint of 21% was picked as the best estimate. My results were then compared with "natives" from the region. A "typical native" of Great Britain supposedly has 60% admixture from Great Britain.
Ancestry explain that what they call the "Great Britain region" is "more admixed than most other regions". They provide examples from their reference populations showing the range of results found with percentages varying from 41% to 100% (see the screenshot below). My 21% from Great Britain obviously makes me a very untypical native! However, the only other British person I know who has tested with AncestryDNA has actually come out even less "British" than me with just 10% admixture from Great Britain and 12% from Ireland. In contrast the American genetic genealogy blogger Blaine Bettinger has reported that his Ancestry DNA results show that 55% of his admixture is from Great Britain and 7% is from Ireland. Another American blogger, Judy Russell, who writes the popular Legal Genealogist blog, now finds that, according to AncestryDNA, 49% of her admixture is from Great Britain. I note, however, that the reference population for the "Great Britain region" consists of a mere 195 samples, which is nowhere near adequate to represent the genetic diversity of a population of over 61 million. Ancestry also have a reference population of just 154 people to represent the people of Ireland, and just 416 samples to represent the "Europe West" region which encompasses France, Germany, Switzerland, Austria, the Low Countries, the Czech Republic and northern Italy.
Ancestry also show the percentages from other regions that were found in their Great Britain reference samples:
Ancestry have now provided more details about the reference populations used for their analysis, and have provided a detailed White Paper explaining the methodology behind the calculations. They explain that the reference panel was compiled from "a set of 4,245 DNA samples collected from people whose genealogy suggests they are native to one region". The reference panel candidates included "over 800 HGDP samples, over 1,500 samples from the proprietary AncestryDNA reference collection, and over 1,800 AncestryDNA customers who have explicitly consented to be included in the reference panel". These 4,245 samples were whittled down to provide a final reference panel of 3,000 samples. The 195 samples from Great Britain were reduced to just 111 samples in this process, and the number of samples from Ireland was cut from 154 to 138.

It is not explicitly stated but I presume that the proprietary reference collection is the Sorenson Molecular Genealogy Foundation database which Ancestry acquired in March 2012. The participants in the SMGF database provided their samples for a non-commercial research project and not for use by a large profit-making company. If the SMGF samples were re-analysed by AncestryDNA then they would be ethically obliged to get consent from the participants for the re-use of their data. It is not clear if this has actually happened.

Almost half of the samples used in the AncestryDNA reference panel were provided by AncestryDNA customers. I presume that these are customers who signed the consent form to participate in AncestryDNA's Human Genetic Diversity Project. As I have written previously, I decided not to participate in this project as I could find no published information to describe what the project entailed. I was also concerned at the somewhat deceptive way in which the consent form was muddled up with the standard terms and conditions, potentially allowing people to join the "project" without providing their informed consent. The AncestryDNA test is currently only on sale in the US. I am one of only a handful people outside the US who ordered the test in the beta-testing phase before Ancestry stopped shipping kits overseas. Therefore almost half the so-called reference samples provided for the AncestryDNA test are provided by Americans. This will inevitably introduce biases into the reference samples as the people who emigrated to America will not necessarily constitute a random sample of the population of Europe. For example, disproportionate numbers of people emigrated to America from Ireland. This bias no doubt explains why, in the few results seen so far, British people are coming out with much lower percentages from the "Great Britain region" than their American counterparts. Americans of British origin will no doubt be a good proxy for other Americans of British origin but it makes no sense to use British Americans as a reference population for "native" British people. Ancestry do also make it clear in their White Paper that they had difficulty differentiating the population of Great Britain from the rest of Western Europe. Samples from Great Britain were being "mis-assigned a significant amount of Western European ethnicity" and vice versa. My unexpectedly high Irish percentage is also presumably an artefact of the biased sampling process.

The use of an all-American reference population of AncestryDNA customers also explains the decision to lump England, Scotland and Wales together into one large "Great Britain region", and to mix the Republic of Ireland and Northern Ireland together into one "Ireland" region. It would have been much more interesting to split the British Isles up into the four constituent countries, but Ancestry clearly did not have sufficient samples with detailed genealogies from each country to do this, again because the reference samples were mostly from America rather than the British Isles. This once again calls into question Ancestry's decision to market their DNA test exclusively in the US. As most Americans are very interested in finding out more about their ancestry in Europe you would have thought it would be in Ancestry's interests to make their test available in other countries. This would have the added benefit of bringing in many more customers with four grandparents all born in the same country who could be used to provide more representative reference samples. If the AncestryDNA test is ever launched in other countries there is now going to be very little incentive for non-Americans to test as they will be overwhelmed with large numbers of distant cousins in America with little chance of ever finding the connection and no tools to filter out these large numbers of matches.

Ancestry do not provide detailed information about the timeframe which is covered by the new ethnicity estimates though they do explain that the results are provided as an "estimate of the ancient historical origins" of their customers' DNA. They add that "While this information is less relevant for genealogical research relating to the last five to ten generations, it may reveal intriguing clues about the distant history of one’s ancestors."

Even though my admixture results from the new Ethnicity Estimate 2.0 are no better than the estimates from the old beta test, Ancestry have at least responded to the criticisms and have now given details of the reference populations used and have provided us with a commendably detailed technical White Paper, though I cannot understand why such basic features were not included right from the outset.  It seems to me that AncestryDNA would have been better off investing their time and energy in providing much-needed matching segment data for their customers rather than tinkering with their "ethnicity" results. These admixture tests are still very much in their infancy and they currently have very little practical application for family history purposes. If you want to have some fun with your DNA results you can get alternative "readings" from the many people who provide a free analysis service. For further details see the ISOGG Wiki page on admixture analyses. In the meantime, if you wish to know your "ethnicity" you should carry on researching your family tree in the traditional way using the paper-based records.

© 2013 Debbie Kennett

Monday 16 September 2013

Rockstar genealogists

I'm very flattered to have been awarded a silver medal in John Reid's Rockstar Genealogists poll which has been running on his Anglo-Celtic Connections blog for the last few weeks.

John's description of a rockstar genealogist is as follows:
Rockstar genealogists are those who give "must attend" presentations at family history conferences or as webinars. Who, when you see a new family history article or publication by that person, makes it a must buy. Who is it that you hang on their every word on a blog, podcast or newsgroup, or follow avidly on Facebook or Twitter?
There was a very long list of names in the nominations which can be found here. I'm very honoured that so many people have voted for me. I'm even more surprised to find that I've been nominated ahead of big names such as Dick Eastman and Else Churchill. The full list of bronze and silver medallists can be found here. The names of the "superstar" gold medallists will be published on John's blog tomorrow.

Thursday 5 September 2013

Private Eye on BritainsDNA

The satirical magazine Private Eye have a short article in this week's issue (No. 1347, 23 August to 5th September 2013, p31) on two very different stories about red hair that were covered by the press at the end of August. The first study reports on a gene mutation found in people with red hair and pale skin which might explain their increased risk of melanoma. This research was published in the respected scientific journal Molecular Cell. Some newspapers, including the Daily Mail and the Telegraph, gave equal coverage to a "'groundbreaking' study from BritainsDNA, suggesting that more than 20m people in the British Isles have genes that produced red hair". The BritainsDNA red-haired study is potentially an interesting piece of research and the press release includes a useful map showing the distribution of red hair in the British Isles. Yet, as Private Eye point out, the study "is not peer-reviewed and is published only on the BritainsDNA website 'to coincide with' (read 'cash in on') last month's Redhead Convention in County Cork."

BritainsDNA promise us on their website that they will "participate in academic conferences designed to peer-authenticate and disseminate new findings and interpretations" and that they will "publish findings from its research programme in peer-reviewed publications". ScotlandsDNA was founded in 2011, and their sister company BritainsDNA was set up the following year. It does of course take time for research to be written up and to go through the peer-review process, but it is surprising that, despite the large amount of press and media coverage devoted to the findings of their research, there has still not been a single paper published in a scientific journal. I hope that we do not have to wait too much longer to see some of their research published in the scientific literature in the usual way.

See also
- Private Eye on Prince William's DNA
- BritainsDNA, The Times and Prince William - the perils of publication by press release
- Sense about genetic ancestry testing
- I don't know what to believe: making sense of science stories. A useful publication from Sense About Science explaining the importance of the peer review process.