Wednesday, 25 September 2019

US Department of Justice Interim Policy on Forensic Genetic Genealogical DNA Analysis and Searching

The US Department of Justice have issued an Interim Policy on Forensic Genetic Genealogical DNA Analysis and Searching. The press release can be found here:

https://www.justice.gov/opa/pr/department-justice-announces-interim-policy-emerging-method-generate-leads-unsolved-violent

Here is a direct link to the policy statement: https://www.justice.gov/olp/page/file/1204386/download

The announcement was made by Ted Hunt, Senior Advisor to the Attorney General on Forensic Science at the Department of Justice, during a talk given at the International Symposium on Human Identification in Palm Springs, California.

Investigative genetic genealogy or forensic genealogy is a powerful tool which can help to solve crimes and identify missing persons but, as with any such tool, it needs to be used responsibly. Earlier this year I wrote an article for Forensic Science International on Using genetic genealogy databases in missing persons cases and to develop suspect leads in violent crimes. I explain in the article how the methodology works but I also discuss some of the privacy implications and the need for ethical and regulatory oversight. These new guidelines from the DOJ are long overdue and this document will be a good starting point for further debate but there are a number of important issues that have not been addressed and which I will discuss briefly below.

Informed consent
The following sentence in the report is highly misleading: "The FGG [forensic genetic genealogical] profile is... compared by automation against the genetic profiles of individuals who have voluntarily submitted their biological samples or entered their genetic profiles into these GG services (‘service users’).

FamilyTreeDNA changed their terms of service in March this year. All European Union customers who tested prior to the change were automatically opted out of law enforcement matching. Going forwards, all customers, including all EU users, were automatically opted in to law enforcement matching. New settings were introduced so that customers could opt out. Previously if you didn't want to share your results with law enforcement you had to switch off matching altogether, thus losing all the benefits of the genetic genealogy database. While FTDNA customers have voluntarily uploaded their samples for their family history research, because of the automatic opt in they cannot in any way be considered to have given consent to have their profiles matched with law enforcement kits. The DOJ perhaps doesn't realise that FTDNA has an international database. There are international conventions on the transfer of DNA data for people in criminal databases (eg, through Interpol or the PrĂ¼m Convention in Europe). Innocent people in genetic genealogy databases should expect similar protections if their data is to be used by law enforcement agencies in foreign countries. I would like to see the DOJ make a requirement that, at the very least, DNA profiles of non-US citizens can only be used if the customers have given explicit informed consent. Ideally they should insist that FTDNA follows the standard convention and adopts an opt in policy for everyone.

GEDmatch customers have voluntarily uploaded their profiles to the database and, following the decision in May this year to require all users to opt in to law enforcement matching, anyone who has exercised this option has given their consent to have their data used for law enforcement investigations.

Proportionality
The DOJ guidelines include the following criteria which must be met before a genetic genealogy search can go ahead:

Before an investigative agency may attempt to use FGGS [forensic genetic genealogical DNA analysis and searching], the forensic profile derived from the candidate forensic sample must have been uploaded to CODIS, and subsequent CODIS searches must have failed to produce a probative and confirmed DNA match.

The investigative agency with jurisdiction of either the crime or the location where the unidentified human remains were discovered (if different) must have pursued reasonable investigative leads to solve the case or to identify the unidentified human remains.

Proportionality is a key concept in the criminal justice system and priority should always be given to the most effective and least intrusive methods. It is notable that in a number of the forensic cases where genetic genealogy databases have been used, the police have failed to use existing and well proven methods.

In some cases, the suspect had already had prior contact with the police or had even been in prison but the police had failed to get a DNA sample. See for example herehere and here. Jerry McFadden was identified through genetic genealogy as the murderer of Anna Marie Hlavka. But McFadden was executed in 1999 without having his DNA taken.

If matches are to be found in the CODIS database then surely it makes sense to ensure that all convicted offenders are in the database. Yet, according to a report from Forensic Magazine in 2017 there is a big backlog of prisoners who have not been tested:
...seven states hold prisoners whose DNA had not been collected, and who were not in CODIS. Most often, these states had no retroactivity conditions in their DNA laws, which were generally enacted in the 1990s and were never extended into the past to include criminals already locked up. But there are other cases where prisoners refused to give samples, or authorities simply didn’t get the testing done for logistical reasons. For others, there were simply collection delays.
There is also a huge backlog of untested sexual assault kits in the US. Astonishingly it's not even possible to determine the size of the backlog but there are probably tens of thousands of kits which still haven't been tested. The End The Backlog website keeps track of the problem.

I cannot understand why these backlogs have been allowed to develop. There must be many victims of crime who have yet to receive justice and potentially many innocent people who could be exonerated if this testing were done.

Familial searching has far fewer privacy implications than genetic genealogy because the people whose DNA is being searched have already committed a crime and can therefore be considered as having forfeited the right to privacy. Familial searching involves a search of the police database to identify people who are first-degree relatives of the suspect (eg, the parent, child or sibling). It works on the basis that crime tends to run in families so, even if the perpetrator himself is not in the database, it's quite possible that he could be identified through a match with another family member.

Some of the cases where genetic genealogy has been used could have been solved much earlier if familial searching had been used. A number of the suspects, including the Golden State Killer, had a brother who already had a criminal record but sadly his DNA was not in the police database.

Familial searching is currently only done in a handful of states such as California, Texas and Colorado. The practice is banned in Maryland and in Washington DC. Familial searching could potentially produce tens of thousands of useful investigative leads. The Florida Department of Law Enforcement have set an excellent example by adopting a policy whereby genetic genealogy is only used if a familial search has produced a negative result. It would be good to see the DOJ updating their policy to follow Florida's example and insist that genetic genealogy should only be used after an unsuccessful familial search.

The National DNA Index System (NDIS), the federal CODIS database, has over 16 million DNA profiles. DNA searches in the national database would probably produce valuable leads in thousands of cases. For some reason the FBI has resisted allowing the use of NDIS for familial searching. However, if the DOJ is happy for genetic genealogy databases to be used then the FBI's position is clearly untenable and familial searching should be introduced as a matter of priority.

Ethical oversight
The policy makes no provision for ethical oversight. Genetic genealogy searches are left to the discretion of the investigative agencies and the prosecutors. Agencies are required only to use companies which "provide explicit notice to their service users and the public that law enforcement may use their service sites". Law enforcement and prosecutors are necessarily going to want to push for the use of genetic genealogy. There is an urgent need for some form of independent oversight to provide balance in the system. Again, Florida is setting an excellent example. Because genetic genealogy is only used in Florida when familial searching has been unsuccessful, all searches must be approved by a Familial Search Review Committee. I would like to see other states follow this example.

Mechanics of genetic genealogy searches
The policy recommends that "The investigative agency shall, if possible, configure service site user settings that control access to FGG profile data and associated account information in a manner that will prevent it from being viewed by other service users." At GEDmatch kits can be uploaded as research kits which means that the user will receive a list of matches but will remain invisible to the people on the match list. At FamilyTreeDNA law enforcement kits appear in the match list of everyone who has not opted out of law enforcement matching. They are not distinguished in any way from those of ordinary users. I understand that some people are trying to encourage FTDNA to adopt a similar system to GEDmatch so that the law enforcement kits cannot be seen. The DOJ missed a big opportunity here and could have insisted that the police only use companies which restrict matching in this way which would put pressure on FTDNA to adopt best practices.

The report discusses the procedures for target testing which is sometimes done when the matches are more distant. Testing someone who is potentially a second cousin to the suspect helps to narrow down the search and confirm that the investigators are pursuing the right line.  The report says that "An investigative agency must seek informed consent from third parties before collecting reference samples that will be used for FGGS". However, they do not discuss the tricky subject of incidental findings, such as the discovery that a target tester is not related to his or her presumed biological family in the expected way.

I am sure that there is much more I could write but I just wanted to set down a few initial thoughts about the areas which I think are the most important.

Further commentary

Sunday, 15 September 2019

New DNA and subscription bundle from Findmypast for UK subscribers

Findmypast are offering a free DNA kit to new UK subscribers for a limited time only. Findmypast have a partnership with Living DNA and the test on offer is the Living DNA test packaged under the FindmypastDNA brand name. Note that the test offered in this bundle is an autosomal test only. If you wish to have the Y-DNA and mtDNA haplogroup reports, which are included with the standard Living DNA test, you would need to upgrade.  

Full details of the offer are included in the following press release from Findmypast which I received on 4th September 2019.  As far as I'm aware this offer is still available. It is restricted to the UK market. I do not know if there are any plans to extend the offer to other countries.
FINDMYPAST ANNOUNCES NEW DNA & SUBSCRIPTION BUNDLE FOR UK MARKET
 
  • While stocks last, all 12 month Pro or Plus subscriptions now come with a free Findmypast DNA kit worth £79
  • New offering allows users to combine cutting-edge science with Findmypast’s archive of more than 9 billion historical records
Leading family history website, Findmypast, has announced that as of today, any UK customer who purchases a 12-month Plus or Pro subscription will receive a Findmypast DNA kit worth £79, completely free of charge whilst stocks last. 
The new bundle combines cutting edge science and traditional family history research methods, allowing family historians to explore their past in more depth than ever before.
Launched in partnership with fellow British brand, Living DNA in November 2018, this first of its kind service uses Living DNA’s unique test employing cutting-edge science to provide a breakdown of 21 regions across Britain and Ireland, connecting family history enthusiasts with the records they need to bring their ancestors’ stories to life. 
Those looking to take advantage of this incredible offer can choose from: 
  • A 12-month Plus subscription priced at £120 – covering all record categories, perfect for expanding your family tree and adding colour to your research
OR
  • 12-month Pro subscription priced at £156  - providing full access to everything Findmypast has to offer, including the largest online collection of British & Irish newspapers
AND…
  • They will receive a Findmypast DNA kit completely FREE plus free delivery (normally £79)
Once you have purchased your Plus or Pro Subscription, or upgraded from a starter sub, simply claim you free kit by visiting /www.findmypast.co.uk/subscribe?dnaoffer=true.  All free kits must be claimed no later than 30 days after the subscription purchase.

Wednesday, 10 July 2019

MyHeritage enters the genetic health testing market


MyHeritage announced the launch of a new Health and Ancestry test on 20th May this year. The launch occurred at a busy time of the year and I've only just got round to investigating it. On doing so, I discovered that the test is on special offer with 40% off the recommended price until 15th July so I decided to take the plunge and order a kit. I already have a MyHeritage DNA account but I have not tested direct at MyHeritage DNA. Instead I transferred my raw data file from AncestryDNA in order to participate in the genetic genealogy matching database. I received various e-mails from MyHeritage about the new health test but none of them mentioned a discount or special introductory offer and the 40% discount does not show up when I log into my existing MyHeritage account. I don't know if this special offer is available to everyone in the UK or if a similar offer exists in other countries so you will need to check for yourself. Make sure you log out of your MyHeritage account first. There was an additional charge of £9 for shipping so the test cost me £114 in total, a saving of £74.


MyHeritage is now using the Illumina Global Screening Array (GSA) for their Health and Ancestry test. This has a different range of markers (SNPs) compared to the AncestryDNA profile I uploaded to my other MyHeritage account. Having two accounts at MyHeritage will allow me to do comparisons to assess the effectiveness of the relationship predictions with the two different chips.

Included with the cost of the test is a free 12-month subscription which will allow me to receive all the new genetic risk and carrier status reports as they are released. The health subscription also includes all of the advanced DNA features for genealogy (such as viewing family trees of DNA matches, viewing shared matches and "ethnicities", and shared ancestral places), all of which previously required a MyHeritage site subscription. The health subscription will normally cost £89. However, I am somewhat concerned that the subscription is automatically renewed at the end of the year. In order to place my order I had to authorise MyHeritage to use my PayPal account for future payments.
I don't like the idea of having subscriptions automatically renewed so I have set myself a reminder to review the situation prior to the renewal. I have also discovered that there is an option buried deep in the settings to switch off the auto-renewal.

If you are ordering a test from MyHeritage there a few things to watch out for from a privacy perspective. As part of the order process you need to provide information about your date of birth. There is of course no requirement that you have to provide the correct date but you need to be aware that with the default privacy settings your "general age" (eg, thirties, forties, fifties) is automatically displayed to other MyHeritage members. MyHeritage also automatically creates a profile for you and by default this is publicly displayed. The default settings are shown in the screenshot below. 



Note that it is not possible to buy a MyHeritage kit as a standalone health test. Also you need to be aware that anyone who orders the Health and Ancestry test is automatically opted in to the relative matching feature and to the sharing of "ethnicity" reports and matching segment data. If you do not wish to opt in to these features you will need to adjust your settings in the My Privacy section under DNA Preferences. MyHeritage probably ought to review some of these default settings for their European Union customers because they are counter to the basic principles of GDPR (General Data Protection Regulation) which requires explicit informed consent.

I recommend checking your MyHeritage settings immediately after placing an order to make sure you are only sharing information that you wish to share.

The MyHeritage Health and Ancestry test currently offers 14 genetic risk reports including reports for haemochromatosis, Alzheimer's, late-onset Parkinson's disease, coeliac disease and hereditary BRCA cancers. There are 13 carrier status reports including cystic fibrosis, sickle cell anaemia and a number of other conditions, most of which I have never heard of. You can see a full list of the health reports here.

There have been concerns about false positive health reports with microarray testing. One study reported a 40% false positive rate though this was based on a small sample of just 49 people and included reports from third-party tools as well as direct-to-consumer genetic testing companies. According to the MyHeritage blog, the company takes additional steps to avoid false positive results and to ensure that the results are accurate. For any condition where a person has a significantly increased genetic risk, MyHeritage will double-check the results with Sanger sequencing.

You can watch the video below for an overview of the new MyHeritage Health and Ancestry test.

If you are in the US you should watch the US version of the video. MyHeritage have partnered with PWN Health, a private American network of doctors and genetic counsellors. US reports are reviewed by a doctor and you are referred, if necessary, to a genetic counsellor. The US price is $199 which works out at about £160 so it appears that American customers are paying a small premium equivalent to about £19 to cover the costs of the medical service. They do not have the option of ordering a test without clinical oversight.

You need to be over 18 to order the MyHeritage Health and Ancestry test. It is available in most countries of the world. It is currently not on sale in the following countries: Israel, France, Germany, Austria, Switzerland, Iran, Libya, Sudan, Somalia, North Korea, Lebanon and Syria. US residents living in the states of Rhode Island, New Jersey and New York are also not able to purchase the test. The price will vary in different countries depending on local taxes and exchange rates.

Direct-to-consumer genetic health tests are also provided by 23andMe, but they only sell their test in a limited number of countries. They are also not able to provide health reports in many of the countries where they sell their test. MyHeritage's entry into this market will now make health reports available to a much wider international consumer base. It will be interesting to see what the take up rate is and if there will be any international regulatory implications. The UK Parliament's Science and Technology Committee has just launched an enquiry into consumer genomics and the effects on the National Health Service. We will need to wait and see what recommendations emerge from this enquiry.

Once I've returned my kit it should take about four to six weeks to get my results and I will report back on what I learn and how the results compare with my 23andMe reports. There are proposals in England to have a paid for NHS genome sequencing service whereby healthy people will serve as "genomic volunteers" who will pay through the NHS for their genome to be sequenced and share the data. In the long run, I am probably likely to get more benefit from a UK-specific service which can be integrated with my health records as part of the NHS Genomic Medicine Service. However, such plans are in their infancy and, even if the service does eventually get off the ground, it will be useful to have data from other sources to serve as a comparison.

For further information about the MyHeritage Health and Ancestry test or to order a kit go to: https://www.myheritage.com/health

Further reading

Thursday, 4 July 2019

Updated genetic communities at AncestryDNA

On 19th June AncestryDNA rolled out over 225 new communities for people with ancestry from Australia, New Zealand, Canada, the United Kingdom and "French North America". These new communities are in addition to the 92 regions in Ireland that were introduced in January this year.  If you have taken a DNA test with AncestryDNA and have ancestry from any of these places you will probably find that your results have been updated and that the regions are now more granular.


I previously had just one genetic community for Southern England with South-East England as a sub-region. I am now in the Central Southern England community and have two sub-regions: (1) Dorset and Somerset; (2) Gloucestershire, Wiltshire and West Oxfordshire. Here is a screenshot of my updated results.


My dad previously had two communities: (1) Southern England with East Anglia and Essex as a sub-region; (2) Wales and the West Midlands. My dad now has three communities: Central Southern England, Devon and Cornwall, and East of England. He does not yet have any sub-regions. Below is a screenshot of his updated results. Note that although Norway and Sweden appear in his "ethnicity" estimate he does not have any documented ancestry from either country in the last 400 years or so. I fully expect Norway and Sweden to disappear from his estimate the next time Ancestry update their reference populations.


My mum was previously in two communities: (1) Southern England with South East England as a sub-region; (2) Wales and the West Midlands. She is now in the Central Southern England community and has Dorset and Somerset has a sub-region. See below for a screenshot of my mum's updated results.


Previously it was possible to access the confidence score for each community/region. I can no longer find a way to do this. I hope this feature will eventually be restored. We also used to have the ability to filter matches by region. This option has now disappeared but I understand that there are plans to restore it.

For my family the results haven't changed too drastically but the new communities do provide a marginal improvement on the previous results and are a good reflection of some of our predominant ancestries. As more people test and more people are added to the genetic networks that form the basis of the communities we can expect the results to change over time. I imagine that it will eventually possible to break the results down into more individual counties.

Have you had some updated communities? What do you think of your results?

Further reading
Related blog posts

Sunday, 2 June 2019

The end of public participation in the Genographic Project

It is the end of an era. The National Geographic Genographic Project has announced that the public participation phase of the project has been closed as of 31st May 2019.  It is no longer possible to order a Genographic kit, but existing orders will be fulfilled within a limit timeframe with the date varying depending on which kit was ordered. There is further information on the Genographic Project website:


The Genographic Project has provided a detailed set of FAQs:


As of today's date, the Genographic Project has sold 997,222 kits in 140 countries.

There are no doubt many kits still waiting to be returned and it's possible that the project will eventually pass the one million milestone.

This was an almost inevitable development after Rupert Murdoch bought out the media arm of the National Geographic and ended its not-for-profit status. The new for-profit arm was re-named as National Geographic Partners and was went into partnership with Disney in March this year. The National Geographic Society continues to operate as a non-profit organisation.

The Genographic Project was not without controversy. See for example the essay The brave new era of human genetics by Hans-Jurgen Bandelt, Yong-Gang Yao, Martin Richards and Antonio Salas published in 2008. The Native American researcher Kim Tallbear published a critique Narratives of race and indigeneity in the Genographic Project in 2007. Many population geneticists were critical of the fancy Y-DNA and mtDNA haplogroup stories provided as customer reports. Ancient DNA testing has now shown that we cannot use the DNA of living people to make inferences about past populations.

However, many genealogists first discovered the joys of genetic genealogy by testing at the Genographic Project. After transferring their DNA results to FamilyTreeDNA many people were then inspired to start their own surname projects, haplogroup projects and geographical projects.

The Genographic Project collected DNA from nearly 100,000 people from indigenous populations around the world. I understand they were waiting for the costs of whole genome sequencing to come down before starting to analyse all the data. This is a valuable resource and the scientific research will continue so we can look forward to many more interesting publications.

Anyone who has tested at the Genographic Project can transfer their data to the FamilyTreeDNA database:


Note, however, that Helix kits, which were sold exclusively in the US, cannot be transferred.

Genographic transfers will have the kit number prefixed by the letter N. Judging by the kit numbers in my projects at FTDNA, well over 200,000 people have already transferred their Genographic results to FTDNA.

When transferring to FamilyTreeDNA you need to be aware that if you participate in relative matching the company is now automatically opting all customers into Law Enforcement Matching. This means that DNA profiles uploaded by law enforcement agencies in the US and their representatives can access your name, your e-mail address and the amount of DNA you share with the the law enforcement kits. Law enforcement matching is not restricted to US citizens but applies to the entire database regardless of country of residence. If you wish to opt out of Law Enforcement Matching you can do so from the Privacy and Sharing Page. If you wish to understand more about these issues you can read my article for Forensic Science International on Using genetic genealogy databases in missing persons cases and to develop suspect leads  in violent crimes.

With thanks to Mats Ahlgren and Paul R Smith in the ISOGG Facebook group. See also Paul's blog post National Geographic Geno Project DNA ending.

Further reading
Genographic Project prepares to shut down consumer database by Roberta Estes, DNAeXplained

Saturday, 1 June 2019

Consuming genetics: ethical and legal considerations of new technologies - videos online

The Petrie-Flom Center at Harvard Law School recently held their annual conference which was devoted to the subject  of “Consuming genetics: ethical and legal considerations of new technologies”. They very kindly recorded all the talks and have made them available online. You can access them from this link:

https://petrieflom.law.harvard.edu/events/details/2019-petrie-flom-center-annual-conference

I've only had time to watch a few of the talks so far but so far they are all of very good quality. I highly recommend that you take time to watch the very moving talk from Kif Augustine-Adams on "Generational failures of law and ethics: rape, Mormon orthodoxy, and the revelatory power of Ancestry DNA". It is a first-hand account of the disruptive power of genetic ancestry testing and the effects on families when long-held secrets are uncovered and promises of anonymity are breached.

It's also worth watching Liza Vertinsky's talk on "Genetic paparazzi vs. genetic privacy". In the UK DNA theft is illegal thanks to the Human Tissue Act passed in 2004. If you test someone's DNA without their consent you could potentially be put in prison. In the US no such laws yet exist and it is possible to test so-called "abandoned DNA" from discarded items without the individual's consent. I suspect it's only a matter of time before a celebrity's privacy is breached by testing their DNA without consent which is likely to cause a big backlash and encourage the introduction of new legislation.

I also recommend watching Natalie Ram's session on "Genetic genealogy and the problem of familial forensic identification" which is very topical in light of the current debates about law enforcement usage of genetic genealogy databases. Natalie highlights the inter-relatedness of DNA which means that informed consent becomes a non-issue. Even if you don't want to upload your DNA to GEDmatch, if your sister exercises her right to share her DNA you could still be caught up in a criminal investigation and have your family tree and your social media accounts trawled by the police.

Thursday, 30 May 2019

Using genetic genealogy databases in missing persons cases and to develop suspect leads in violent crimes

Last year I was invited by Rob Davis, an editor at Forensic Science International, to write an article about the privacy issues relating to the use of genetic genealogy in cold cases. My article "Using genetic genealogy databases in missing persons cases and to develop suspect leads in violent crimes" has gone through the peer review process and has now been published online. You can access the full article through my special author's link which will be valid until 19th July 2019:

https://authors.elsevier.com/a/1Z8MC1MCG0LzX~

I hope the article will educate people about all the issues involved and encourage policy makers to work on some suitable best practice guidelines to ensure that the technology can be used both effectively and responsibly.

You can see the full list of articles in the special Cold Case issue here.

Sunday, 26 May 2019


A major milestone was passed this week by AncestryDNA who announced that their "consumer DNA network has reached over 15 million completed samples".

We are seeing a rapid growth in the ancestry testing market in the UK. According to a YouGov survey last month an estimated 4.7 million Brits have already used a DNA testing service.  AncestryDNA do not give breakdowns by country but anecdotally we know that they have the largest market share in the UK and it seems likely that perhaps as many as two million Brits are already in the AncestryDNA database.

Below is the press release I received from AncestryDNA which also includes news of some updates expected later this year.
LEHI, Utah and SAN FRANCISCO, California, Tuesday, May 21, 2019 - Today Ancestry®, the global leader in family history and consumer genomics, announced its consumer DNA network has reached over 15 million completed samples. With the company’s growing network and innovative research tools, Ancestry can now provide customers with even more DNA matches, further detailed ethnicity insights, and ultimately, help more people around the globe discover their unique family story. 
“I have had a front row seat as the genetic genealogy industry has grown from a spark of an idea to a global phenomenon that has made statements like ‘My DNA says I am...’ commonplace in grocery stores, office buildings and family dinners,” said Diahan Southard, founder and author of Your DNA Guide and genetic genealogy educator. “Ancestry has been at the forefront of innovation and played a central role in this growth by making science exciting for everyone and providing meaningful insights into our origins and relationships. Every researcher knows that the more data we have, the more complete our story. With a network this large, coupled with millions of digitized records, everyone is sure to find out more about their own story.” 
“Ancestry is honored to play a role in empowering the journeys of personal discovery for 15 million people around the world,” said Cathy Ball, Chief Scientific Officer, Ancestry. “The size of this community is a true sign of how deeply important it is for people to connect and learn about their past. As the network continues to grow, we can deliver even more value to our members, including more granular insights about heritage, and provide compelling new paths to learn about ourselves using genetics.” 
The growing AncestryDNA network, combined with cutting-edge technology and content additions, gives new and existing Ancestry members ongoing value and new, rich information with their DNA results: 
New Communities: As the AncestryDNA network grows, Ancestry scientists are able to refine and discover more communities using Ancestry’s patented Genetic Communities™ technology – a proprietary technology that can connect people through their DNA to the places their ancestors lived and the paths they followed to get there over the past 75-300 years. Ancestry recently released 94 new and updated AncestryDNA communities for customers of African American and Afro-Caribbean descent, with even more communities launching soon. 
Refined Ethnicity Insights: As more people take the AncestryDNA test, Ancestry scientists are able to add additional samples to the reference panel, paving the way for more refined insights for members about their genetically inherited ethnicity. Thanks to the largest consumer DNA network, AncestryDNA is preparing another update for later this year which will include new ethnic regions, providing members with a more detailed view of their heritage. 
Even More Matches and Customer Discoveries: The size of the AncestryDNA network directly increases the quality and quantity of discoveries people can make using tools such as DNA Matches, and one of our newest features, ThruLines™. ThruLines (currently in BETA) can show common ancestors that members may share with their DNA matches and give a clear and simple view of how all matches are connected through that shared ancestor. With this innovation, combined with millions of Ancestry member trees, family tree building has never been easier, and the discoveries people can make are unprecedented. Additionally, now that the AncestryDNA network has over 15 million members, each AncestryDNA customer receives an average of 50,000 total matches – and that number grows by 2%-5% each month as more people join the network.

Sunday, 24 March 2019

Advanced Genetic Genealogy: Techniques and Case Studies

I've been sworn to secrecy for the last two years but I am now pleased to announce the publication of a new book Advanced Genetic Genealogy: Techniques and Case Studies, edited by Debbie Parker Wayne. I contributed a chapter on "The promise and limitations of genetic genealogy" where I had great fun speculating about what the future holds for genetic genealogy. There are another thirteen chapters contributed by many well known names in the genealogy world. I've not yet seen the book or had the chance to review any of the chapters so I'm very much looking forward to reading it when my copy arrives.

Debbie Parker Wayne has worked really hard behind the scenes to bring this much-needed book to fruition and I am very grateful for her patience and encouragement.

The book is currently on sale on Amazon. On the US site the book is showing as being available for shipping within the next one to two days. On the UK site delivery is expected within the next one to two months. A Kindle version will be available in May 2019. For US readers who are going to the National Genealogical Society conference in May you will be able to buy a copy from Books and Things who will be exhibiting at the conference. For details see here.

Here is the description of the book from Amazon:
Advanced Genetic Genealogy: Techniques and Case Studies helps intermediate researchers move up to the next level and advanced researchers apply the new DNA standards and write about DNA. This new book offers an in-home course in advanced genetic genealogy. Case studies demonstrate analyzing the DNA test results, correlating with documentary evidence, and writing about the findings, all incorporating the updated standards for using DNA. Full-color illustrations help the genealogist incorporate these techniques into personal or client research projects. Each of the fourteen chapters was written by a professional genealogist with DNA experience. 
Eight chapters study real families (some using anonymized identities), including methods, tools, and techniques. Jim Bartlett covers how to triangulate a genome (mapping DNA segments to ancestors). Blaine T. Bettinger demonstrates the methodology for visual phasing (mapping DNA segments to the grandparents who passed down the segment to descendants, even when the grandparents cannot be tested). Kathryn J. Johnston shows how to use X-DNA to identify and confirm ancestral lines. James M. Owston describes findings of the Owston Y-DNA project. Melissa A. Johnson covers adoption and misattributed parentage research. Kimberly T. Powell provides guidance when researching families with endogamy and pedigree collapse. Debbie Parker Wayne combines atDNA and Y-DNA in a Parker family study. Ann Turner describes the raw DNA data and lab processes. 
Three middle chapters cover genealogy standards as they relate to DNA and documentary evidence. Karen Stanbary applies the Genealogical Proof Standard to genetic genealogy in a hypothetical unknown parentage case illustrating start-to-finish analysis. Patricia Lee Hobbs uses atDNA to identify an unknown ancestor and that ancestor's maiden name, moving back and forth between documentary and DNA evidence. Thomas W. Jones describes best practices for genealogical writing and publishing when incorporating DNA evidence. 
Three concluding chapters deal with ethics, emotions, and the future. Judy G. Russell covers ethical considerations. Michael D. Lacopo describes the effect on relationships when family secrets are uncovered, surfacing issues for all concerned. Debbie Kennett covers the current limitations and future promise of using DNA for genealogy. An extensive glossary, list of recommended resources, and index are included.
Disclosure
If you click on the Amazon UK links in this blog it is vaguely possible that at some point in the distant future I might receive a microscopic payment from Amazon as part of their affiliate scheme to help support my writing. Using the affiliate links makes no difference to the prices you pay. 

Tuesday, 5 March 2019

Ancestry updates at Rootstech – ThruLines, Tree Tags and Improved DNA Matches

At Rootstech last weekend Ancestry announced the launch of three new features: Tree Tags, ThruLines, and New and Improved Matches. The announcement was made in the keynote speech by Margo Georgiadis, Ancestry's new CEO, which you can now watch on YouTube.



Margo stated that Ancestry host 100 million family trees, and that they "will soon have more than 15 million people" in their DNA network, making it the largest consumer DNA database in the world.

Crista Cowan provided a live demonstration of the new tools in her presentation "What you don't know about Ancestry". The recording of her talk is now available on the Rootstech website and is well worth watching if you want to get a good overview of the new features. I also recommend watching Diahan Southard's talk on Connecting your DNA matches, which shows how to use the tools to form genetic networks, but also highlights some of the limitations.

The Tree Tags and New and Improved Matches are currently in beta testing. You can opt in to the beta by accessing the new AncestryLab menu on your Ancestry account. This can be found under the Extras tab. ThruLines has been rolled out to the entire AncestryDNA database and you will see the feature when you log into your DNA account. The screenshot below shows what my home page now looks like.



The new matches experience replicates the functionality of the third-party Chrome extensions MedBetter DNA and DNA Match Labelling, which many of us found very helpful for managing our matches. These extensions are now redundant. If you have previously used the extensions you need to be aware that they won't work with the new system. Before participating in the beta you might want to make a note of all your groupings and tags so that you can transfer them to the new interface.

Because these new tools are all in beta testing it's important to remember that the features may change and new functionality might be added. If you spot any bugs or have suggestions for improving the tools make sure you submit feedback to Ancestry.

So now let's have a look at the Improved Matches and ThruLines features and see how they work in practice.

New and improved DNA matches
I currently have 24,900 matches at AncestryDNA, which is far more than I could ever possibly hope to investigate. Any tools that will help me to sort and filter these matches to find the most useful ones are always going to be welcome. Below is a screenshot of my new matches page. I've tested both my parents and matches are now allocated to the father's side and the mother's side. However, the parental sides are only shown for my 162 fourth cousin and closer matches. It would be really helpful if Ancestry could extend this tool to show the sides for the more distant matches, or at least for the first 1000 or 2000 matches on your match list. You will also notice that the amount of shared cM has been round up to the nearest whole number.


I always write fastidious notes about my matches and the notes can now be viewed directly from the matches page without having to click through to view the match. This functionality was previously only available when using the MedBetter DNA extension. You won't be able to see the full note on the matches page, but you can click on the note symbol, as in the screenshot above, to read the full text.

Previously we could view 50 matches on a page. Now there is an infinite scroll system which means that you can keep scrolling down the page and see more and more matches. This is very handy if you're trying to search by keyword for particular matches. I can now, for example, see all my fourth cousin and closer matches on a single page. Previously you could use the page numbers to work out how many pages of matches you had, which then allowed you to calculate the total number of matches. I can't currently see any way of replicating this function with the new system.(*See the update at the end of this article.) If you want to know how many matches you have you can use the DNAGedcom Client to download your entire match list. The Client is available for a modest subscription from DNAGedcom.

The updated match list has a number of new filters for sorting your matches. You can sort by close matches, distant matches, matches you haven't viewed, matches with notes, matches you've messaged and tree status (private linked trees, public linked trees and unlinked trees). Unfortunately the facility to filter matches by sub-region has temporarily been lost but I understand that it will eventually be restored.

The good news is that it's now possible to create custom groups which can be labelled and assigned a colour. This new feature is modelled very closely on the DNA Match Labelling extension, but provides additional functionality such as the ability to filter matches by custom group. You can also have more than one coloured dot for each match. There are 24 different colours available. I'm still experimenting with the coloured dots but an obvious use of the custom groups is to assign different colours to specific surnames or ancestral couples.

I've added a coloured dot for matches which don't have any shared matches. I can use this filter to go back and check these matches from time to time to see if they do now have any shared matches. Currently 23 of my 162 fourth cousins or closer (14%) don't have any shared matches. It will be interesting to see if the percentage drops over time as more matches start to come in.

I also have a coloured dot for what I call "dodgy matches". These are matches which share substantially less DNA with my parents than they do with me and are therefore not likely to be worth pursuing. Currently 29 of my 162 fourth cousin or closer matches (18%) fall into this dodgy category. In some cases there is a discrepancy of 10 cM or more. One match appears to share 22.3 cM with me but shares just 6.5 cM with my dad. Two of my dodgy matches actually share small amounts of DNA with both of my parents. One appears to share a single 21.2 cM segment with me but this is actually two segments: a 7.8 cM segment shared with my mum and a 6.5 cM segment shared with my dad. My second double-sided match shares 20.1 cM across 2 segments with me but this translates into a 6.1 cM segment shared with my mum and an 11.9 cM segment shared with my dad. I've not found any genealogical connection between my mum and dad within the last 400 years and these double matches are likely to be signals of very distant sharing dating back hundreds of years.

The screenshot below shows some of the custom groups I've started to use. No doubt my system will evolve over time. The coloured dots currently sort alphabetically so I might at some point decide to introduce a numbering system to get them to sort in a particular order.

Common ancestors
AncestryDNA's shared ancestor hints, otherwise known as shaky leaf hints, have now been renamed as common ancestors. I previously only ever had six shared ancestor hints and two of those were with my parents. I now have 20 common ancestors, though again two of those are with my parents. The shaky leaf hints only used our family tree and the trees of our matches to identify a shared ancestor. The common ancestors feature compares family trees and then deploys the power of other people's family trees to try and identify the common ancestor. The common ancestor feature is a useful tool to guide your research but every link will need to be carefully checked, and extreme caution needs to be exercised when matches are identified with more distant cousins.

Below is an example of a predicted common ancestor with a half sixth cousin. Ancestry have used my tree and my cousin's tree and then used a third-party tree to link the two lines together. Tidbury is indeed a name which appears in my family tree. The surname is concentrated in Berkshire and Hampshire and it's quite likely that I share a genealogical relationship with this match. However, the Tidburys are on my mother's side and this cousin matches me through my father so there is clearly no genetic connection. Even if the genetic connection was real, the total amount of DNA shared is very small and there is currently no way of assigning a single small segment like this to a sixth cousin with confidence.


ThruLines
ThruLines is a replacement for DNA Circles. The DNA Circles feature will continue in parallel to ThruLines for now but is no longer going to be updated. While DNA Circles was an interesting concept, in practice it was only of use to a subset of the AncestryDNA database because of the strict requirements needed to create a circle. I only ever had one DNA Circle, but AncestryDNA have identified 57 potential ancestors for me with the ThruLines feature.

ThruLines is now in open beta testing and any Ancestry member who meets the following criteria will receive the feature free of charge for a limited time:
  • Your AncestryDNA results must be linked to a public or private searchable family tree.
  • You must have DNA matches who have also linked their results to a public or private indexed family tree.
  • Your linked family tree needs to be well built out. It should be 3-4 generations deep to have the best chance of ThruLines finding new discoveries for you to explore
AncestryDNA have said that the feature will come out of beta testing when they have had enough feedback to validate the value of the tool to their customers, including whether or not the feature will require a subscription.

With ThruLines you get a DNA record card for each ancestor for whom you have a DNA match. If you click on the card you will get a report showing the possible pathways through which you and your matches are connected to the ancestor. The pathways are filled in not just from your own family tree or the family trees of your matches but from multiple Ancestry family trees. This is essentially a machine-learning algorithm deploying the power of big data to make connections. I would guess that, like the We're Related App, the feature is generated by the AncestryDNA Big Tree. For information about the Big Tree I recommend reading the very informative blog post by Randy Seaver "Is there really an Ancestry.com Big Tree?"

You can filter the ThruLines in three ways.
  1. If you filter by ancestors from your linked tree you will be given descendant reports on your matches who meet the required criteria and  who share the ancestors featured in your tree. This is a very useful way of locating matches who descend from a specific ancestor.
  2. If you filter by potential ancestors you will be able to view links that Ancestry have made between you and your matches by stitching together multiple family trees to make connections.
  3. The third option is all ancestors which will show you all your potential ancestors and all ancestors from your linked tree
ThruLines goes back seven generations to fifth great grandparents which means that you will not get potential ancestor hints with anyone who is not a sixth cousin or closer.

The example belows show a genetic descendancy report for my great-great-great-grandfather Samuel Trask. It shows my line of descent on the left and then the lines of three of my matches and their lines of descent down to the present day with the speculative links highlighted with dotted boxes. In this particular case the connections were hard to make because these matches were related to me through a female line. I hadn't noted these changes of surname upon marriage in my family tree so I might not have spotted these connections easily without this feature. However, caution will need to be exercised and the trees of these matches will need to be evaluated independently. I will also have to check the amount of shared DNA to see if it is consistent with the hypothesised relationships.


The example below shows a potential ancestor that Ancestry have identified for my great-great-great-grandfather Thomas Thorn. Two cousins who are potentially related to me through Thomas Thorn are shown, and Ancestry have also identified William Thorn as Thomas's possible father because of a DNA connection through Robert Browning Thorne. I have insufficient evidence at present to evaluate this possible link but it may be that I will get more matches on these lines in future which will provide further clues.


One disadvantage of the ThruLines is that the potential ancestor feature is triggered even if your only DNA match is with a parent. As I've tested both of my parents I'm finding that a lot of the potential ancestors are only appearing because of a single parental match. Ideally parents should be excluded from the feature or there should be the ability to filter out matches with parents.

The feature seems to be most useful for identifying how you are connected to your DNA matches. It's particularly helpful for the fifth to eighth cousin matches which are otherwise very difficult to search. In many cases the matches only share very small amounts of DNA under 10 cM and I'm not convinced that the DNA match is a result of the documented genealogical connection. Nevertheless, it's still useful to identify more genealogical cousins and to have the opportunity to communicate with them about our shared surnames and family trees.

Probabilities
A nifty bonus feature of the Improved DNA Matches and the ThruLines is that you can now generate a table showing the possible relationships and the probabilities of those different relationships. This is very similar to the Shared cM Tool on the DNA Painter website. The DNA Painter tool uses probabilities based on simulated data from the AncestryDNA Matching White Paper combined with ranges derived from the Shared cM Project. These new tables from Ancestry are based on simulations. They've not released any details but I'm hoping that they will publish a white paper or blog explaining the methodology behind their calculations. You can access the probabilities table from your new matches page by clicking on the question mark next to the amount of shared DNA. You can access the tool from the ThruLines feature by clicking on the amount of DNA you share with your matches in the descendancy chart.

Here is the probability table from Ancestry for one of my second cousins with whom I share 168 cM.

Interestingly, Ancestry have predicted that this person is my third to fourth cousin whereas the probabilities show that he is more likely to be a second cousin.

If I plug the amount of shared DNA (168 cM) into the Shared cM Tool at DNA Painter I get the following probability table which suggests that this cousin is slightly more likely to be a half second cousin than a full second cousin but still has a reasonable probability of being a second cousin.
I've found that Ancestry's probabilities are not so realistic for the more distant relationships in the fifth to eight cousin category where people share much smaller amounts of DNA. Most of the matches identified with the ThruLines feature appear to fall into this category. The Ancestry simulations don't seem to have gone beyond the fifth cousin level. As a result, the probabilities suggest that almost all of your DNA matches, however little DNA they share, will be fifth cousins or closer. Here's the probability table for the predicted half sixth cousin I mentioned above who shares just 10 cM with me. The table suggests that 98% of cousins sharing 10 cM will be fifth cousins or closer. A half sixth cousin doesn't even appear as a possibility.


This prediction is actually in conflict with Ancestry's confidence scores. Matches sharing 6-16 cM are classified as moderate confidence matches with only a 15-50% chance of sharing a single recent common ancestor. Ancestry say that for moderate confidence matches "You and your match might share DNA because of a recent common ancestor or couple, share DNA from very distant ancestors, or you might not be related."

Two separate studies have also shown that 10 cM segments can potentially go back a long way and can perhaps date back twenty or thirty generations. A chart from a 2012 study by Speed and Balding reproduced in the ISOGG Wiki shows that less than 40% of 10 cM segments are likely to fall within the last 10 generations. The Speed and Balding paper used computer simulations. A 2013 study from Ralph and Coop, using real life data from the European POPRES dataset, showed that while most 10 cM segments are likely to come from the last 500 years, we have many more very distant cousins so the vast majority of 10 cM segments will be shared through very distant ancestors. The authors go on to say that "the typical age of a 10 cM block shared by two individuals from the United Kingdom is between 32 and 52 generations". The distribution will vary depending on a given population's shared history.

I would suggest that Ancestry need to refine the parameters of their simulations to allow for more distant sharing and to produce more realistic probability tables for matches sharing low amounts of DNA. In the meantime, when weighing up the strength of DNA evidence, you need to bear in mind the uncertainties in the predictions when sharing small amounts of DNA. If you share a single 10 cM segment with a cousin there is currently no way of determining whether you are matching as sixth cousins or sixteenth cousins.

Conclusion
These new features from Ancestry are very welcome. The new matching interface is already making it much easier to navigate and sort our matches. ThruLines is likely to be a valuable tool for identifying potentially interesting matches to follow up. The predictions are likely to get better over time as more data becomes available and as the machine learning improves. Remember that any tool has to be used carefully. It provides clues but does not give you all the answers. You will still need to evaluate the DNA evidence in combination with the genealogical evidence.

Update 7th March 2019
Check out the All Matches filter in your new match list. AncestryDNA have now included stats on the total number of matches, the number of close and distant matches, the number of new matches and matches shared with your mother or father. I now have 25,089 matches, 166 of which are predicted to be fourth cousins or closer.

Further reading and resources

Thursday, 3 January 2019

What we learned about fighting bad science by taking on a genetic ancestry testing company

The following blog post was written by David Balding and Debbie Kennett. It is based on an article written in collaboration with Mark Thomas and Adrian Timpson entitled The rise and fall of BritainsDNA: a tale of misleading claims, media manipulation and threats to academic freedompublished in the peer-reviewed journal Genealogy. In just a few weeks the article has achieved the distinction of being the most viewed article in the journal's history. As of today's date it has been seen 3,658 times, and 2,190 people have downloaded a copy of the article. The blog post was originally intended for publication in The Conversation. However, the piece was subsequently rejected because the website's lawyer considered that it was "potentially defamatory in its current state". The Defamation Act of 2013 includes a provision for matters of public interest and provides special privileges for statements published in peer-reviewed journals. We believe that there is a strong public interest in highlighting this story. It is important that academic debate is not stifled by legal threats. There is nothing in the blog post which is not already referenced in our peer-reviewed article. We have therefore published it below in its entirety. 

The worlds of academia and industry are getting closer than ever before. Academic scientists are encouraged to engage directly with industry through consultancy roles, and to commercialise their research through the creation of new enterprises. At the same time, research institutions encourage promotion of resulting new findings to a broad public through the news media.

These trends can lead to conflicts of interest. Media savvy companies can and do attract free coverage for their science-related business under the guise of a public interest science story. It is possible that universities could collude with this deception in their eagerness to attract media attention by allowing a scientist to use the university brand in media presentations, without acknowledging the business motivation.

Our new case study of the former consumer genetic ancestry testing company BritainsDNA, published in the journal Genealogy, sheds light on how conflicts of interests can play out in reality.

Genetic ancestry tests are important tools for genealogists when used in combination with documentary and historical records. Y-chromosome DNA (Y-DNA) tests can be used to trace a man’s paternal ancestry, while mitochondrial DNA (mtDNA) provides information about ancestry on the direct maternal line. There are also autosomal DNA tests (the autosomes are the chromosomes other than the X, Y and mtDNA, and contain most of your DNA) which are useful in finding matches with genetic relatives in a database. Autosomal DNA tests are now the most popular tests. Ancestry testing is a multi-million-dollar industry, and around 18m people have now tested worldwide.

Such tests can be very reliable to reveal ancestry in recent generations. However, once you go beyond about 10 generations back, only a small fraction of the DNA of ancestors will have contributed to a living individual’s DNA. So while there’s a lot of research on human history through DNA, there is little that can be said that is specific to the customer. That means these tests cannot be used on their own to determine exactly where you came from.

The case of BritainsDNA
BritainsDNA was active before the growth of the autosomal DNA databases and focused on Y-DNA and mtDNA testing. They were able to achieve substantial favourable coverage in newspapers, radio and television, with stories drawing questionable links for example between contemporary British people and the Queen of Sheba.

In another promotion, the public service Welsh-language TV channel S4C ran a five-part series called “DNA Cymru” investigating the question of “Who are the Welsh?”. To participate, members of the public were invited to buy a Y-DNA or mtDNA test from the company’s Welsh website.

But the results of this “research” were not published in a scientific journal. Instead viewers were regaled with stories about the ancestry of celebrities, for example, that their Y-DNA or mtDNA results indicated they were ancient Welsh, pioneers or Rhinelanders. Yet these descriptions are so generic that they apply to ancestors of almost anyone: they are essentially meaningless. Y-DNA and mtDNA comprise just 2% of our DNA, and convey very limited information about the history of a nation.

So how could this happen? Two principal actors in the company were a geneticist from The University of Edinburgh and an historian and former television executive who at the time held an unpaid position as Rector of St Andrews University.

The university roles of the company’s directors were used to lend credibility to the promotions. The media outlets did not seek the views of other scientists, who would have contested many of the claims. Few journalists have scientific training, which can allow sensationalised or unbalanced reporting. And while most scientists can be relied upon to be objective, journalists need to be aware that research-related commercial interests can affect scientists’ motivations.

Challenging the claims
We formed part of a small group of concerned scientists who tried to challenge this avalanche of marketing disguised as science. This was prompted by an interview on the prestigious BBC Radio 4 Today programme, which described a “massively subsidised” project to study the DNA of Britons as “bringing the Bible to life”. In fact, the interviewee was there to sell DNA tests and the BBC interviewer turned out to be an old chum. Our challenges were met with legal threats from the company, and resistance to acknowledging editorial failure from the BBC and other media.

Many scientists don’t speak out because of a fear of legal action. We were fortunate to have strong support from the then UCL Provost, and from many colleagues. So we decided to continue to challenge the misleading claims. We were also encouraged by the science writer Simon Singh, who had himself been sued by the British Chiropractic Association for critical comments made in The Guardian. Although the case against Singh was eventually dropped, he suffered years of personal stress and substantial unrecovered legal costs. We also received support from the charity Sense About Science, and we worked with them to prepare the pamphlet Sense about Genetic Ancestry Testing and an article on Sense about Genealogical DNA Testing.

The satirical magazine Private Eye was the only media outlet to see through the company’s misleading media campaign from the start. But eventually we had complaints upheld by the BBC, which also aired a radio documentary that partly corrected previous claims.

Over a period of years we got the upper hand. The company did not pursue its legal threat and eventually went out of business. With a move to genome-wide genetic data, containing more information than is available from Y-DNA and mtDNA, there is reduced scope for fanciful storytelling today. However, there remain problems with ancestry companies failing to reveal limitations of their analyses or to indicate uncertainty in inferences. The population labels that are used are not well defined and can conform to outdated notions of race and identity.

Our story has wider implications about the relationships between business and academia and the reporting of science stories in the media. We hope that our case study will be used to inform media training and education programmes, and that universities monitor the abuse of academic position to advance business interests. Most importantly, we hope that other scientists will be encouraged by our experience and will not be afraid to speak out against bad science.

Further reading
Academics pan Melrose-based DNA business  an overview of our paper from Ewan Lamb on the Not Just Sheep and Rugby blog.
Talking Headlines with Debbie Kennett - My interview with Talking Headlines about our BritainsDNA paper, the lessons learnt and how to detect fake science news.