Sunday, 2 June 2019

The end of public participation in the Genographic Project

It is the end of an era. The National Geographic Genographic Project has announced that the public participation phase of the project has been closed as of 31st May 2019.  It is no longer possible to order a Genographic kit, but existing orders will be fulfilled within a limit timeframe with the date varying depending on which kit was ordered. There is further information on the Genographic Project website:


The Genographic Project has provided a detailed set of FAQs:


As of today's date, the Genographic Project has sold 997,222 kits in 140 countries.

There are no doubt many kits still waiting to be returned and it's possible that the project will eventually pass the one million milestone.

This was an almost inevitable development after Rupert Murdoch bought out the media arm of the National Geographic and ended its not-for-profit status. The new for-profit arm was re-named as National Geographic Partners and was went into partnership with Disney in March this year. The National Geographic Society continues to operate as a non-profit organisation.

The Genographic Project was not without controversy. See for example the essay The brave new era of human genetics by Hans-Jurgen Bandelt, Yong-Gang Yao, Martin Richards and Antonio Salas published in 2008. The Native American researcher Kim Tallbear published a critique Narratives of race and indigeneity in the Genographic Project in 2007. Many population geneticists were critical of the fancy Y-DNA and mtDNA haplogroup stories provided as customer reports. Ancient DNA testing has now shown that we cannot use the DNA of living people to make inferences about past populations.

However, many genealogists first discovered the joys of genetic genealogy by testing at the Genographic Project. After transferring their DNA results to FamilyTreeDNA many people were then inspired to start their own surname projects, haplogroup projects and geographical projects.

The Genographic Project collected DNA from nearly 100,000 people from indigenous populations around the world. I understand they were waiting for the costs of whole genome sequencing to come down before starting to analyse all the data. This is a valuable resource and the scientific research will continue so we can look forward to many more interesting publications.

Anyone who has tested at the Genographic Project can transfer their data to the FamilyTreeDNA database:


Note, however, that Helix kits, which were sold exclusively in the US, cannot be transferred.

Genographic transfers will have the kit number prefixed by the letter N. Judging by the kit numbers in my projects at FTDNA, well over 200,000 people have already transferred their Genographic results to FTDNA.

When transferring to FamilyTreeDNA you need to be aware that if you participate in relative matching the company is now automatically opting all customers into Law Enforcement Matching. This means that DNA profiles uploaded by law enforcement agencies in the US and their representatives can access your name, your e-mail address and the amount of DNA you share with the the law enforcement kits. Law enforcement matching is not restricted to US citizens but applies to the entire database regardless of country of residence. If you wish to opt out of Law Enforcement Matching you can do so from the Privacy and Sharing Page. If you wish to understand more about these issues you can read my article for Forensic Science International on Using genetic genealogy databases in missing persons cases and to develop suspect leads  in violent crimes.

With thanks to Mats Ahlgren and Paul R Smith in the ISOGG Facebook group. See also Paul's blog post National Geographic Geno Project DNA ending.

Further reading
Genographic Project prepares to shut down consumer database by Roberta Estes, DNAeXplained

Saturday, 1 June 2019

Consuming genetics: ethical and legal considerations of new technologies - videos online

The Petrie-Flom Center at Harvard Law School recently held their annual conference which was devoted to the subject  of “Consuming genetics: ethical and legal considerations of new technologies”. They very kindly recorded all the talks and have made them available online. You can access them from this link:

https://petrieflom.law.harvard.edu/events/details/2019-petrie-flom-center-annual-conference

I've only had time to watch a few of the talks so far but so far they are all of very good quality. I highly recommend that you take time to watch the very moving talk from Kif Augustine-Adams on "Generational failures of law and ethics: rape, Mormon orthodoxy, and the revelatory power of Ancestry DNA". It is a first-hand account of the disruptive power of genetic ancestry testing and the effects on families when long-held secrets are uncovered and promises of anonymity are breached.

It's also worth watching Liza Vertinsky's talk on "Genetic paparazzi vs. genetic privacy". In the UK DNA theft is illegal thanks to the Human Tissue Act passed in 2004. If you test someone's DNA without their consent you could potentially be put in prison. In the US no such laws yet exist and it is possible to test so-called "abandoned DNA" from discarded items without the individual's consent. I suspect it's only a matter of time before a celebrity's privacy is breached by testing their DNA without consent which is likely to cause a big backlash and encourage the introduction of new legislation.

I also recommend watching Natalie Ram's session on "Genetic genealogy and the problem of familial forensic identification" which is very topical in light of the current debates about law enforcement usage of genetic genealogy databases. Natalie highlights the inter-relatedness of DNA which means that informed consent becomes a non-issue. Even if you don't want to upload your DNA to GEDmatch, if your sister exercises her right to share her DNA you could still be caught up in a criminal investigation and have your family tree and your social media accounts trawled by the police.

Thursday, 30 May 2019

Using genetic genealogy databases in missing persons cases and to develop suspect leads in violent crimes

Last year I was invited by Rob Davis, an editor at Forensic Science International, to write an article about the privacy issues relating to the use of genetic genealogy in cold cases. My article "Using genetic genealogy databases in missing persons cases and to develop suspect leads in violent crimes" has gone through the peer review process and has now been published online. You can access the full article through my special author's link which will be valid until 19th July 2019:

https://authors.elsevier.com/a/1Z8MC1MCG0LzX~

I hope the article will educate people about all the issues involved and encourage policy makers to work on some suitable best practice guidelines to ensure that the technology can be used both effectively and responsibly.

You can see the full list of articles in the special Cold Case issue here.

Sunday, 26 May 2019


A major milestone was passed this week by AncestryDNA who announced that their "consumer DNA network has reached over 15 million completed samples".

We are seeing a rapid growth in the ancestry testing market in the UK. According to a YouGov survey last month an estimated 4.7 million Brits have already used a DNA testing service.  AncestryDNA do not give breakdowns by country but anecdotally we know that they have the largest market share in the UK and it seems likely that perhaps as many as two million Brits are already in the AncestryDNA database.

Below is the press release I received from AncestryDNA which also includes news of some updates expected later this year.
LEHI, Utah and SAN FRANCISCO, California, Tuesday, May 21, 2019 - Today Ancestry®, the global leader in family history and consumer genomics, announced its consumer DNA network has reached over 15 million completed samples. With the company’s growing network and innovative research tools, Ancestry can now provide customers with even more DNA matches, further detailed ethnicity insights, and ultimately, help more people around the globe discover their unique family story. 
“I have had a front row seat as the genetic genealogy industry has grown from a spark of an idea to a global phenomenon that has made statements like ‘My DNA says I am...’ commonplace in grocery stores, office buildings and family dinners,” said Diahan Southard, founder and author of Your DNA Guide and genetic genealogy educator. “Ancestry has been at the forefront of innovation and played a central role in this growth by making science exciting for everyone and providing meaningful insights into our origins and relationships. Every researcher knows that the more data we have, the more complete our story. With a network this large, coupled with millions of digitized records, everyone is sure to find out more about their own story.” 
“Ancestry is honored to play a role in empowering the journeys of personal discovery for 15 million people around the world,” said Cathy Ball, Chief Scientific Officer, Ancestry. “The size of this community is a true sign of how deeply important it is for people to connect and learn about their past. As the network continues to grow, we can deliver even more value to our members, including more granular insights about heritage, and provide compelling new paths to learn about ourselves using genetics.” 
The growing AncestryDNA network, combined with cutting-edge technology and content additions, gives new and existing Ancestry members ongoing value and new, rich information with their DNA results: 
New Communities: As the AncestryDNA network grows, Ancestry scientists are able to refine and discover more communities using Ancestry’s patented Genetic Communities™ technology – a proprietary technology that can connect people through their DNA to the places their ancestors lived and the paths they followed to get there over the past 75-300 years. Ancestry recently released 94 new and updated AncestryDNA communities for customers of African American and Afro-Caribbean descent, with even more communities launching soon. 
Refined Ethnicity Insights: As more people take the AncestryDNA test, Ancestry scientists are able to add additional samples to the reference panel, paving the way for more refined insights for members about their genetically inherited ethnicity. Thanks to the largest consumer DNA network, AncestryDNA is preparing another update for later this year which will include new ethnic regions, providing members with a more detailed view of their heritage. 
Even More Matches and Customer Discoveries: The size of the AncestryDNA network directly increases the quality and quantity of discoveries people can make using tools such as DNA Matches, and one of our newest features, ThruLines™. ThruLines (currently in BETA) can show common ancestors that members may share with their DNA matches and give a clear and simple view of how all matches are connected through that shared ancestor. With this innovation, combined with millions of Ancestry member trees, family tree building has never been easier, and the discoveries people can make are unprecedented. Additionally, now that the AncestryDNA network has over 15 million members, each AncestryDNA customer receives an average of 50,000 total matches – and that number grows by 2%-5% each month as more people join the network.

Sunday, 24 March 2019

Advanced Genetic Genealogy: Techniques and Case Studies

I've been sworn to secrecy for the last two years but I am now pleased to announce the publication of a new book Advanced Genetic Genealogy: Techniques and Case Studies, edited by Debbie Parker Wayne. I contributed a chapter on "The promise and limitations of genetic genealogy" where I had great fun speculating about what the future holds for genetic genealogy. There are another thirteen chapters contributed by many well known names in the genealogy world. I've not yet seen the book or had the chance to review any of the chapters so I'm very much looking forward to reading it when my copy arrives.

Debbie Parker Wayne has worked really hard behind the scenes to bring this much-needed book to fruition and I am very grateful for her patience and encouragement.

The book is currently on sale on Amazon. On the US site the book is showing as being available for shipping within the next one to two days. On the UK site delivery is expected within the next one to two months. A Kindle version will be available in May 2019. For US readers who are going to the National Genealogical Society conference in May you will be able to buy a copy from Books and Things who will be exhibiting at the conference. For details see here.

Here is the description of the book from Amazon:
Advanced Genetic Genealogy: Techniques and Case Studies helps intermediate researchers move up to the next level and advanced researchers apply the new DNA standards and write about DNA. This new book offers an in-home course in advanced genetic genealogy. Case studies demonstrate analyzing the DNA test results, correlating with documentary evidence, and writing about the findings, all incorporating the updated standards for using DNA. Full-color illustrations help the genealogist incorporate these techniques into personal or client research projects. Each of the fourteen chapters was written by a professional genealogist with DNA experience. 
Eight chapters study real families (some using anonymized identities), including methods, tools, and techniques. Jim Bartlett covers how to triangulate a genome (mapping DNA segments to ancestors). Blaine T. Bettinger demonstrates the methodology for visual phasing (mapping DNA segments to the grandparents who passed down the segment to descendants, even when the grandparents cannot be tested). Kathryn J. Johnston shows how to use X-DNA to identify and confirm ancestral lines. James M. Owston describes findings of the Owston Y-DNA project. Melissa A. Johnson covers adoption and misattributed parentage research. Kimberly T. Powell provides guidance when researching families with endogamy and pedigree collapse. Debbie Parker Wayne combines atDNA and Y-DNA in a Parker family study. Ann Turner describes the raw DNA data and lab processes. 
Three middle chapters cover genealogy standards as they relate to DNA and documentary evidence. Karen Stanbary applies the Genealogical Proof Standard to genetic genealogy in a hypothetical unknown parentage case illustrating start-to-finish analysis. Patricia Lee Hobbs uses atDNA to identify an unknown ancestor and that ancestor's maiden name, moving back and forth between documentary and DNA evidence. Thomas W. Jones describes best practices for genealogical writing and publishing when incorporating DNA evidence. 
Three concluding chapters deal with ethics, emotions, and the future. Judy G. Russell covers ethical considerations. Michael D. Lacopo describes the effect on relationships when family secrets are uncovered, surfacing issues for all concerned. Debbie Kennett covers the current limitations and future promise of using DNA for genealogy. An extensive glossary, list of recommended resources, and index are included.
Disclosure
If you click on the Amazon UK links in this blog it is vaguely possible that at some point in the distant future I might receive a microscopic payment from Amazon as part of their affiliate scheme to help support my writing. Using the affiliate links makes no difference to the prices you pay. 

Tuesday, 5 March 2019

Ancestry updates at Rootstech – ThruLines, Tree Tags and Improved DNA Matches

At Rootstech last weekend Ancestry announced the launch of three new features: Tree Tags, ThruLines, and New and Improved Matches. The announcement was made in the keynote speech by Margo Georgiadis, Ancestry's new CEO, which you can now watch on YouTube.



Margo stated that Ancestry host 100 million family trees, and that they "will soon have more than 15 million people" in their DNA network, making it the largest consumer DNA database in the world.

Crista Cowan provided a live demonstration of the new tools in her presentation "What you don't know about Ancestry". The recording of her talk is now available on the Rootstech website and is well worth watching if you want to get a good overview of the new features. I also recommend watching Diahan Southard's talk on Connecting your DNA matches, which shows how to use the tools to form genetic networks, but also highlights some of the limitations.

The Tree Tags and New and Improved Matches are currently in beta testing. You can opt in to the beta by accessing the new AncestryLab menu on your Ancestry account. This can be found under the Extras tab. ThruLines has been rolled out to the entire AncestryDNA database and you will see the feature when you log into your DNA account. The screenshot below shows what my home page now looks like.



The new matches experience replicates the functionality of the third-party Chrome extensions MedBetter DNA and DNA Match Labelling, which many of us found very helpful for managing our matches. These extensions are now redundant. If you have previously used the extensions you need to be aware that they won't work with the new system. Before participating in the beta you might want to make a note of all your groupings and tags so that you can transfer them to the new interface.

Because these new tools are all in beta testing it's important to remember that the features may change and new functionality might be added. If you spot any bugs or have suggestions for improving the tools make sure you submit feedback to Ancestry.

So now let's have a look at the Improved Matches and ThruLines features and see how they work in practice.

New and improved DNA matches
I currently have 24,900 matches at AncestryDNA, which is far more than I could ever possibly hope to investigate. Any tools that will help me to sort and filter these matches to find the most useful ones are always going to be welcome. Below is a screenshot of my new matches page. I've tested both my parents and matches are now allocated to the father's side and the mother's side. However, the parental sides are only shown for my 162 fourth cousin and closer matches. It would be really helpful if Ancestry could extend this tool to show the sides for the more distant matches, or at least for the first 1000 or 2000 matches on your match list. You will also notice that the amount of shared cM has been round up to the nearest whole number.


I always write fastidious notes about my matches and the notes can now be viewed directly from the matches page without having to click through to view the match. This functionality was previously only available when using the MedBetter DNA extension. You won't be able to see the full note on the matches page, but you can click on the note symbol, as in the screenshot above, to read the full text.

Previously we could view 50 matches on a page. Now there is an infinite scroll system which means that you can keep scrolling down the page and see more and more matches. This is very handy if you're trying to search by keyword for particular matches. I can now, for example, see all my fourth cousin and closer matches on a single page. Previously you could use the page numbers to work out how many pages of matches you had, which then allowed you to calculate the total number of matches. I can't currently see any way of replicating this function with the new system.(*See the update at the end of this article.) If you want to know how many matches you have you can use the DNAGedcom Client to download your entire match list. The Client is available for a modest subscription from DNAGedcom.

The updated match list has a number of new filters for sorting your matches. You can sort by close matches, distant matches, matches you haven't viewed, matches with notes, matches you've messaged and tree status (private linked trees, public linked trees and unlinked trees). Unfortunately the facility to filter matches by sub-region has temporarily been lost but I understand that it will eventually be restored.

The good news is that it's now possible to create custom groups which can be labelled and assigned a colour. This new feature is modelled very closely on the DNA Match Labelling extension, but provides additional functionality such as the ability to filter matches by custom group. You can also have more than one coloured dot for each match. There are 24 different colours available. I'm still experimenting with the coloured dots but an obvious use of the custom groups is to assign different colours to specific surnames or ancestral couples.

I've added a coloured dot for matches which don't have any shared matches. I can use this filter to go back and check these matches from time to time to see if they do now have any shared matches. Currently 23 of my 162 fourth cousins or closer (14%) don't have any shared matches. It will be interesting to see if the percentage drops over time as more matches start to come in.

I also have a coloured dot for what I call "dodgy matches". These are matches which share substantially less DNA with my parents than they do with me and are therefore not likely to be worth pursuing. Currently 29 of my 162 fourth cousin or closer matches (18%) fall into this dodgy category. In some cases there is a discrepancy of 10 cM or more. One match appears to share 22.3 cM with me but shares just 6.5 cM with my dad. Two of my dodgy matches actually share small amounts of DNA with both of my parents. One appears to share a single 21.2 cM segment with me but this is actually two segments: a 7.8 cM segment shared with my mum and a 6.5 cM segment shared with my dad. My second double-sided match shares 20.1 cM across 2 segments with me but this translates into a 6.1 cM segment shared with my mum and an 11.9 cM segment shared with my dad. I've not found any genealogical connection between my mum and dad within the last 400 years and these double matches are likely to be signals of very distant sharing dating back hundreds of years.

The screenshot below shows some of the custom groups I've started to use. No doubt my system will evolve over time. The coloured dots currently sort alphabetically so I might at some point decide to introduce a numbering system to get them to sort in a particular order.

Common ancestors
AncestryDNA's shared ancestor hints, otherwise known as shaky leaf hints, have now been renamed as common ancestors. I previously only ever had six shared ancestor hints and two of those were with my parents. I now have 20 common ancestors, though again two of those are with my parents. The shaky leaf hints only used our family tree and the trees of our matches to identify a shared ancestor. The common ancestors feature compares family trees and then deploys the power of other people's family trees to try and identify the common ancestor. The common ancestor feature is a useful tool to guide your research but every link will need to be carefully checked, and extreme caution needs to be exercised when matches are identified with more distant cousins.

Below is an example of a predicted common ancestor with a half sixth cousin. Ancestry have used my tree and my cousin's tree and then used a third-party tree to link the two lines together. Tidbury is indeed a name which appears in my family tree. The surname is concentrated in Berkshire and Hampshire and it's quite likely that I share a genealogical relationship with this match. However, the Tidburys are on my mother's side and this cousin matches me through my father so there is clearly no genetic connection. Even if the genetic connection was real, the total amount of DNA shared is very small and there is currently no way of assigning a single small segment like this to a sixth cousin with confidence.


ThruLines
ThruLines is a replacement for DNA Circles. The DNA Circles feature will continue in parallel to ThruLines for now but is no longer going to be updated. While DNA Circles was an interesting concept, in practice it was only of use to a subset of the AncestryDNA database because of the strict requirements needed to create a circle. I only ever had one DNA Circle, but AncestryDNA have identified 57 potential ancestors for me with the ThruLines feature.

ThruLines is now in open beta testing and any Ancestry member who meets the following criteria will receive the feature free of charge for a limited time:
  • Your AncestryDNA results must be linked to a public or private searchable family tree.
  • You must have DNA matches who have also linked their results to a public or private indexed family tree.
  • Your linked family tree needs to be well built out. It should be 3-4 generations deep to have the best chance of ThruLines finding new discoveries for you to explore
AncestryDNA have said that the feature will come out of beta testing when they have had enough feedback to validate the value of the tool to their customers, including whether or not the feature will require a subscription.

With ThruLines you get a DNA record card for each ancestor for whom you have a DNA match. If you click on the card you will get a report showing the possible pathways through which you and your matches are connected to the ancestor. The pathways are filled in not just from your own family tree or the family trees of your matches but from multiple Ancestry family trees. This is essentially a machine-learning algorithm deploying the power of big data to make connections. I would guess that, like the We're Related App, the feature is generated by the AncestryDNA Big Tree. For information about the Big Tree I recommend reading the very informative blog post by Randy Seaver "Is there really an Ancestry.com Big Tree?"

You can filter the ThruLines in three ways.
  1. If you filter by ancestors from your linked tree you will be given descendant reports on your matches who meet the required criteria and  who share the ancestors featured in your tree. This is a very useful way of locating matches who descend from a specific ancestor.
  2. If you filter by potential ancestors you will be able to view links that Ancestry have made between you and your matches by stitching together multiple family trees to make connections.
  3. The third option is all ancestors which will show you all your potential ancestors and all ancestors from your linked tree
ThruLines goes back seven generations to fifth great grandparents which means that you will not get potential ancestor hints with anyone who is not a sixth cousin or closer.

The example belows show a genetic descendancy report for my great-great-great-grandfather Samuel Trask. It shows my line of descent on the left and then the lines of three of my matches and their lines of descent down to the present day with the speculative links highlighted with dotted boxes. In this particular case the connections were hard to make because these matches were related to me through a female line. I hadn't noted these changes of surname upon marriage in my family tree so I might not have spotted these connections easily without this feature. However, caution will need to be exercised and the trees of these matches will need to be evaluated independently. I will also have to check the amount of shared DNA to see if it is consistent with the hypothesised relationships.


The example below shows a potential ancestor that Ancestry have identified for my great-great-great-grandfather Thomas Thorn. Two cousins who are potentially related to me through Thomas Thorn are shown, and Ancestry have also identified William Thorn as Thomas's possible father because of a DNA connection through Robert Browning Thorne. I have insufficient evidence at present to evaluate this possible link but it may be that I will get more matches on these lines in future which will provide further clues.


One disadvantage of the ThruLines is that the potential ancestor feature is triggered even if your only DNA match is with a parent. As I've tested both of my parents I'm finding that a lot of the potential ancestors are only appearing because of a single parental match. Ideally parents should be excluded from the feature or there should be the ability to filter out matches with parents.

The feature seems to be most useful for identifying how you are connected to your DNA matches. It's particularly helpful for the fifth to eighth cousin matches which are otherwise very difficult to search. In many cases the matches only share very small amounts of DNA under 10 cM and I'm not convinced that the DNA match is a result of the documented genealogical connection. Nevertheless, it's still useful to identify more genealogical cousins and to have the opportunity to communicate with them about our shared surnames and family trees.

Probabilities
A nifty bonus feature of the Improved DNA Matches and the ThruLines is that you can now generate a table showing the possible relationships and the probabilities of those different relationships. This is very similar to the Shared cM Tool on the DNA Painter website. The DNA Painter tool uses probabilities based on simulated data from the AncestryDNA Matching White Paper combined with ranges derived from the Shared cM Project. These new tables from Ancestry are based on simulations. They've not released any details but I'm hoping that they will publish a white paper or blog explaining the methodology behind their calculations. You can access the probabilities table from your new matches page by clicking on the question mark next to the amount of shared DNA. You can access the tool from the ThruLines feature by clicking on the amount of DNA you share with your matches in the descendancy chart.

Here is the probability table from Ancestry for one of my second cousins with whom I share 168 cM.

Interestingly, Ancestry have predicted that this person is my third to fourth cousin whereas the probabilities show that he is more likely to be a second cousin.

If I plug the amount of shared DNA (168 cM) into the Shared cM Tool at DNA Painter I get the following probability table which suggests that this cousin is slightly more likely to be a half second cousin than a full second cousin but still has a reasonable probability of being a second cousin.
I've found that Ancestry's probabilities are not so realistic for the more distant relationships in the fifth to eight cousin category where people share much smaller amounts of DNA. Most of the matches identified with the ThruLines feature appear to fall into this category. The Ancestry simulations don't seem to have gone beyond the fifth cousin level. As a result, the probabilities suggest that almost all of your DNA matches, however little DNA they share, will be fifth cousins or closer. Here's the probability table for the predicted half sixth cousin I mentioned above who shares just 10 cM with me. The table suggests that 98% of cousins sharing 10 cM will be fifth cousins or closer. A half sixth cousin doesn't even appear as a possibility.


This prediction is actually in conflict with Ancestry's confidence scores. Matches sharing 6-16 cM are classified as moderate confidence matches with only a 15-50% chance of sharing a single recent common ancestor. Ancestry say that for moderate confidence matches "You and your match might share DNA because of a recent common ancestor or couple, share DNA from very distant ancestors, or you might not be related."

Two separate studies have also shown that 10 cM segments can potentially go back a long way and can perhaps date back twenty or thirty generations. A chart from a 2012 study by Speed and Balding reproduced in the ISOGG Wiki shows that less than 40% of 10 cM segments are likely to fall within the last 10 generations. The Speed and Balding paper used computer simulations. A 2013 study from Ralph and Coop, using real life data from the European POPRES dataset, showed that while most 10 cM segments are likely to come from the last 500 years, we have many more very distant cousins so the vast majority of 10 cM segments will be shared through very distant ancestors. The authors go on to say that "the typical age of a 10 cM block shared by two individuals from the United Kingdom is between 32 and 52 generations". The distribution will vary depending on a given population's shared history.

I would suggest that Ancestry need to refine the parameters of their simulations to allow for more distant sharing and to produce more realistic probability tables for matches sharing low amounts of DNA. In the meantime, when weighing up the strength of DNA evidence, you need to bear in mind the uncertainties in the predictions when sharing small amounts of DNA. If you share a single 10 cM segment with a cousin there is currently no way of determining whether you are matching as sixth cousins or sixteenth cousins.

Conclusion
These new features from Ancestry are very welcome. The new matching interface is already making it much easier to navigate and sort our matches. ThruLines is likely to be a valuable tool for identifying potentially interesting matches to follow up. The predictions are likely to get better over time as more data becomes available and as the machine learning improves. Remember that any tool has to be used carefully. It provides clues but does not give you all the answers. You will still need to evaluate the DNA evidence in combination with the genealogical evidence.

Update 7th March 2019
Check out the All Matches filter in your new match list. AncestryDNA have now included stats on the total number of matches, the number of close and distant matches, the number of new matches and matches shared with your mother or father. I now have 25,089 matches, 166 of which are predicted to be fourth cousins or closer.

Further reading and resources

Thursday, 3 January 2019

What we learned about fighting bad science by taking on a genetic ancestry testing company

The following blog post was written by David Balding and Debbie Kennett. It is based on an article written in collaboration with Mark Thomas and Adrian Timpson entitled The rise and fall of BritainsDNA: a tale of misleading claims, media manipulation and threats to academic freedompublished in the peer-reviewed journal Genealogy. In just a few weeks the article has achieved the distinction of being the most viewed article in the journal's history. As of today's date it has been seen 3,658 times, and 2,190 people have downloaded a copy of the article. The blog post was originally intended for publication in The Conversation. However, the piece was subsequently rejected because the website's lawyer considered that it was "potentially defamatory in its current state". The Defamation Act of 2013 includes a provision for matters of public interest and provides special privileges for statements published in peer-reviewed journals. We believe that there is a strong public interest in highlighting this story. It is important that academic debate is not stifled by legal threats. There is nothing in the blog post which is not already referenced in our peer-reviewed article. We have therefore published it below in its entirety. 

The worlds of academia and industry are getting closer than ever before. Academic scientists are encouraged to engage directly with industry through consultancy roles, and to commercialise their research through the creation of new enterprises. At the same time, research institutions encourage promotion of resulting new findings to a broad public through the news media.

These trends can lead to conflicts of interest. Media savvy companies can and do attract free coverage for their science-related business under the guise of a public interest science story. It is possible that universities could collude with this deception in their eagerness to attract media attention by allowing a scientist to use the university brand in media presentations, without acknowledging the business motivation.

Our new case study of the former consumer genetic ancestry testing company BritainsDNA, published in the journal Genealogy, sheds light on how conflicts of interests can play out in reality.

Genetic ancestry tests are important tools for genealogists when used in combination with documentary and historical records. Y-chromosome DNA (Y-DNA) tests can be used to trace a man’s paternal ancestry, while mitochondrial DNA (mtDNA) provides information about ancestry on the direct maternal line. There are also autosomal DNA tests (the autosomes are the chromosomes other than the X, Y and mtDNA, and contain most of your DNA) which are useful in finding matches with genetic relatives in a database. Autosomal DNA tests are now the most popular tests. Ancestry testing is a multi-million-dollar industry, and around 18m people have now tested worldwide.

Such tests can be very reliable to reveal ancestry in recent generations. However, once you go beyond about 10 generations back, only a small fraction of the DNA of ancestors will have contributed to a living individual’s DNA. So while there’s a lot of research on human history through DNA, there is little that can be said that is specific to the customer. That means these tests cannot be used on their own to determine exactly where you came from.

The case of BritainsDNA
BritainsDNA was active before the growth of the autosomal DNA databases and focused on Y-DNA and mtDNA testing. They were able to achieve substantial favourable coverage in newspapers, radio and television, with stories drawing questionable links for example between contemporary British people and the Queen of Sheba.

In another promotion, the public service Welsh-language TV channel S4C ran a five-part series called “DNA Cymru” investigating the question of “Who are the Welsh?”. To participate, members of the public were invited to buy a Y-DNA or mtDNA test from the company’s Welsh website.

But the results of this “research” were not published in a scientific journal. Instead viewers were regaled with stories about the ancestry of celebrities, for example, that their Y-DNA or mtDNA results indicated they were ancient Welsh, pioneers or Rhinelanders. Yet these descriptions are so generic that they apply to ancestors of almost anyone: they are essentially meaningless. Y-DNA and mtDNA comprise just 2% of our DNA, and convey very limited information about the history of a nation.

So how could this happen? Two principal actors in the company were a geneticist from The University of Edinburgh and an historian and former television executive who at the time held an unpaid position as Rector of St Andrews University.

The university roles of the company’s directors were used to lend credibility to the promotions. The media outlets did not seek the views of other scientists, who would have contested many of the claims. Few journalists have scientific training, which can allow sensationalised or unbalanced reporting. And while most scientists can be relied upon to be objective, journalists need to be aware that research-related commercial interests can affect scientists’ motivations.

Challenging the claims
We formed part of a small group of concerned scientists who tried to challenge this avalanche of marketing disguised as science. This was prompted by an interview on the prestigious BBC Radio 4 Today programme, which described a “massively subsidised” project to study the DNA of Britons as “bringing the Bible to life”. In fact, the interviewee was there to sell DNA tests and the BBC interviewer turned out to be an old chum. Our challenges were met with legal threats from the company, and resistance to acknowledging editorial failure from the BBC and other media.

Many scientists don’t speak out because of a fear of legal action. We were fortunate to have strong support from the then UCL Provost, and from many colleagues. So we decided to continue to challenge the misleading claims. We were also encouraged by the science writer Simon Singh, who had himself been sued by the British Chiropractic Association for critical comments made in The Guardian. Although the case against Singh was eventually dropped, he suffered years of personal stress and substantial unrecovered legal costs. We also received support from the charity Sense About Science, and we worked with them to prepare the pamphlet Sense about Genetic Ancestry Testing and an article on Sense about Genealogical DNA Testing.

The satirical magazine Private Eye was the only media outlet to see through the company’s misleading media campaign from the start. But eventually we had complaints upheld by the BBC, which also aired a radio documentary that partly corrected previous claims.

Over a period of years we got the upper hand. The company did not pursue its legal threat and eventually went out of business. With a move to genome-wide genetic data, containing more information than is available from Y-DNA and mtDNA, there is reduced scope for fanciful storytelling today. However, there remain problems with ancestry companies failing to reveal limitations of their analyses or to indicate uncertainty in inferences. The population labels that are used are not well defined and can conform to outdated notions of race and identity.

Our story has wider implications about the relationships between business and academia and the reporting of science stories in the media. We hope that our case study will be used to inform media training and education programmes, and that universities monitor the abuse of academic position to advance business interests. Most importantly, we hope that other scientists will be encouraged by our experience and will not be afraid to speak out against bad science.

Further reading
Academics pan Melrose-based DNA business  an overview of our paper from Ewan Lamb on the Not Just Sheep and Rugby blog.
Talking Headlines with Debbie Kennett - My interview with Talking Headlines about our BritainsDNA paper, the lessons learnt and how to detect fake science news.