Tuesday, 21 July 2020

Malicious phishing attempt at MyHeritage

Following on from the recent security breach at GEDmatch, there has now been a malicious phishing attempt at MyHeritage which is possibly linked to the GEDmatch breach. Thanks to the prompt actions by MyHeritage staff the threat appears to have been averted but make sure you watch out for fake e-mails purporting to come from the company. 

It is quite possible that the other genetic genealogy companies will similarly be targeted for phishing attacks so be alert and look out for any suspicious e-mails and check the reply field to ensure that the e-mail is legitimate.

You can read about the MyHeritage incident in their blog post:
 
Security alert: malicious phishing attempt detected, possibly connected to GEDmatch breach

My post on the GEDmatch security breach has been updated several times since I published it on Sunday so do check back if you want to keep on top of all the developments.



Sunday, 19 July 2020

Major privacy breach at GEDmatch

There has been a major privacy breach at GEDmatch, the third-party genetic genealogy website which has become well known in the last two years because of its use by law enforcement agencies in the US to solve cold cases. A member of the Genetic Genealogy Ireland Facebook group posted a message at lunchtime today (13.38 pm UK time) to advise that the site had been compromised and that people were receiving what appeared to be fake matches with suspicious e-mail addresses.(This Facebook post has now been deleted.) Some users were reporting that they were receiving unusually large numbers of  new matches, all sharing unexpectedly high amounts of DNA which would normally indicate a very close relationship. In another group, one user reported receiving over 3000 matches, all of which shared over 700 cM. A match in this range would normally indicate a very close relationship such as a first cousin or closer.

Later on this afternoon (14.54 pm UK time) a user posted in the Genetic Genealogy Tips and Techniques group on Facebook that all his kits on GEDmatch were now publicly accessible and all marked as available to the police. This included not just standard kits but also phased kits and Lazarus kits,which are by default always marked as research kits and are not normally available for matching. I checked my own account at GEDmatch and found that all my kits had been changed without my consent to allow police access. This included two phased research kits which were never intended to be made public. I initially found that I was unable to change the settings on any of the kits. The site was up and down for a short while this afternoon before I was finally able to log in and restore my preferred access settings.

Since then GEDmatch has been offline with a message that the site is down for maintenance.
Many other people have also reported that their kits have been affected and that the settings have been changed to allow police access without their consent. Graham Coop shared on Twitter this afternoon a screenshot of his accounts showing how they had all been changed to allow police access..


It therefore appears that the entire database has been changed to make all kits available for police access. This also means that the law enforcement kits, which are normally uploaded as research kits so that they do not appear in match lists, have been compromised. Anyone logging onto the website during this period would have seen those kits and might have been able to save a screenshot with the kit numbers. Allowing unauthorised access to law enforcement kits could potentially have serious consequences and could compromise an investigation.

This is clearly a matter of great concern. There are well over 1.2 million profiles on GEDmatch but only around 200,000 or so kits had opted to make their profiles available for law enforcement matching. This means that the DNA profiles and e-mail addresses of probably around a million people have been exposed, including all the law enforcement kits. It is unlikely anyone would have been able to do anything with the matches during the period when the website was compromised because so many spurious matches were being produced. It is the exposure of the e-mail addresses and kit numbers which is likely to be of the most concern.

According to a report on the Tech in the City website the original privacy settings were restored before the site was taken down though I'm not clear what time this happened as I'm not clear what timezone the author is reporting from.

As GEDmatch operates in the European Union and has many EU customers, they are obliged to comply with the EU's General Data Protection Regulation (GDPR). Because of the serious nature of this breach it seems likely that they will have to report the matter to the appropriate regulatory authority in the EU. I don't know which authority they have registered with but the Information Commissioner's Office in the UK has information on how such data breaches should be reported. If a company or organisation has not protected the security of its customers than an enforcement action can be take and the company can be fined.

GEDmatch have since advised that they are aware of the issues and are responding. According to a post in the GEDmatch User Group on Facebook GEDmatch are "doing research right now to confirm what is happening. They are leaving the site down until they can clearly confirm what is going on." They are expected to make a formal statement later. It appears that this was an inadvertent update that went wrong. There appears to be no evidence that the site was hacked.

In the meantime it is pointless to speculate about what might have happened and we will need to await until further information is available. I will update this page if I receive any further news.

Update
Just after publishing this blog post I discovered (22.51 pm UK time) that GEDmatch is back up and running and my kits all have the correct access levels.

23.09 pm The following message has been posted on the GEDmatch Facebook page.

Update 21 July 2020
GEDmatch have announced on their Facebook page that they experienced a security breach on Sunday which was orchestrated through a sophisticated attack on one of their servers via an existing user account. The site was functioning briefly yesterday but reports started coming in late last night that people were once again receiving lots of unexpectedly high matches with a low SNP overlap in their match lists. I was able to briefly log into my account at 1.00 am night and found that the kit I checked had lots of matches with users with words like "imputed" and "partial" in the names. My highest match was at the first cousin level with a user from the Chinese company Gese DNA. The site has now been taken down and GEDmatch are working with a cybersecurity company to implement new security measures. Here is a screenshot of the message from GEDmatch. I've removed the contact details from the post but these are are available in the full version of the message in the Facebook group. 


It is good that GEDmatch are being transparent about the problems and this may turn out for the best in the long run if the security of the database is improved. The site was down for at least three hours and although they say that no data was downloaded in that time it would have been possible to take screenshots of match lists from many different accounts. Once you have a kit number you then essentially have access to that individual's account. It is also a cascading effect because you can click on all the matches of the matches as well. This essentially means that all the kit numbers have been compromised because no one will know which kits were affected. All the kit numbers will need to be changed. Ideally it would be better if GEDmatch did not reveal kit numbers in the match lists. It will be interesting to see what happens but I rather suspect the site will be down for a long time.

Further update 21 July 2020
5.00 pm 
From the GEDmatch Facebook page: "GEDmatch will remain offline for 2 to 3 days as we further enhance security protocols. Thank you for your patience. We apologize for the inconvenience this has caused."

Update 22 July 2020
MyHeritage advised late last night of a security alert involving a malicious phishing attempt that was possible related to the GEDmatch breach. For full details see the MyHeritage blog post:


The further reading section of this blog post has been updated to include an informative blog post from Leah Larkin explaining why we were seeing the mystery matches at GEDmatch sharing unusually high amounts of DNA. I have also included an official statement from Verogen which was published on their blog on 20th July, a further blog post from Leah Larkin which includes a timeline of the events and an article from Peter Aldhous of Buzzfeed News..
 
An e-mail has been sent out by Verogen to all GEDmatch users informing them of the breach. My e-mail arrived at 8.40 am. It may take time for a bulk e-mail to reach all 1.2 million or more users. If you haven't received the e-mail check your spam folder. I've copied the text below in case you haven't received it.

Dear GEDmatch member,

On the morning of July 19, GEDmatch experienced a security breach orchestrated through a sophisticated attack on one of our servers via an existing user account. We became aware of the situation a short time later and immediately took the site down. As a result of this breach, all user permissions were reset, making all profiles visible to all users. This was the case for approximately 3 hours. During this time, users who did not opt-in for law enforcement matching were available for law enforcement matching, and, conversely, all law enforcement profiles were made visible to GEDmatch users.

On Monday, July 20, as we continued to investigate the incident and work on a permanent solution to safeguard against threats of this nature, we discovered that the site was still vulnerable and made the decision to take the site down until such time that we can be absolutely sure that user data is protected against potential attacks. It was later confirmed that GEDmatch was the target of a second breach in which all user permissions were set to opt-out of law enforcement matching.

We can assure you that your DNA information was not compromised, as GEDmatch does not store raw DNA files on the site. When you upload your data, the information is encoded, and the raw file deleted. This is one of the ways we protect our users’ most sensitive information.

Further, we are working with a leading cybersecurity firm to conduct a comprehensive forensic review and help us implement the best possible security measures. We expect the site will be up within the next day or two.

We have reported the unauthorized access to the appropriate authorities and continue to work toward identifying the individuals responsible for this criminal act.

Today, we were informed that MyHeritage customers who are also GEDmatch users were the target of a phishing scam. Please remember to exercise caution when opening emails and clicking links. Never provide sensitive information via email. If an email seems suspicious, contact the company in question directly through the phone number or email address listed on their website, not via a reply to the suspicious email. You can reach GEDmatch at  xxxx or xxxxx [email address and telephone number removed]. At this time, we have no evidence to suggest the phishing scam is a result of the GEDmatch security breach this week. We are continuing to investigate the incident.

Please be assured that we take these matters very seriously. Our Number 1 responsibility is to protect the data of our users. We know we have not lived up to this responsibility this week, and we are working hard to regain your trust. We apologize for the concern and frustration this situation has caused.

Sincerely,

Brett Williams
CEO, Verogen Inc.

For a French translation of this e-mail see the post in the Facebook group France ADN - Généalogie Génétique (ISOGG).

Update 25th July 2020
There is a notice on the GEDmatch Facebook suggesting that the site will be back online today though at 11.35 am UK time the site was still down.

The site was restored in the afternoon of 25th July and no further issues have been reported to date.

Tuesday, 14 July 2020

Some updates to AncestryDNA's matching system and a database update


Ancestry announced at a conference call today that there are some changes in the pipeline in terms of how our matches are reported. There will be three main changes:

1) Ancestry will provide a more accurate report on the number of segments shared with your matches. The updated matching algorithm may reduce the estimated number of segments you share with some. of your DNA matches. However, it won't change the estimated total amount of shared DNA (measured in centimorgans/cM) or the predicted relationship to your matches.

2) Ancestry will report the length of the largest shared segment. This is particularly important for people who are descended from endogamous populations. Knowing the length of the longest segment you and a DNA match have in common can help determine if you’re actually related. The longer the segment, the more likely you’re related. Segment length is also the easiest way to evaluate the difference between multiple matches that all show the same estimated relationship.

3) The matches will be re-calibrated to remove false matches so that the reported matches are more likely to be related through a recent common ancestor. Once the update is implemented, only matches which share 8 cM or more will be reported. Ancestry estimate that this will remove about two thirds of the false matches. All matches that fall below the new threshold will disappear from your match list with the exception of matches you have messaged, matches where you've added a note and matches you have added to a group by using the system of coloured dots. Starred matches will also be retained as they are considered part of a group. If you save a match below 8cM, your match will also have it saved without additional action needed. Any matches sharing less than 8 cM in total will no longer appear as common ancestor hints or in the ThruLines feature and this change may affect the number of ThruLines you see. If you want to save these matches you'll need to make sure you add them to one of your groups or add a note. Note that it is only the total cM shared after the application of the Timber algorithm that is affected so you could still have matches which share some individual segments that are smaller than 8 cM so long as the sum total of all the segments is over 8 cM.

On site messaging will start to appear on the site in the next few days (this messaging is now live) to alert users to the updated matching system and a new matching white paper will be available later this week. (The White paper has now been published and can be accessed here.) We can expect to see the new matching system rolled out in early August.

The increase in the match threshold will mean that many matches will disappear from our match lists. However, in practice, this is not going to have any effect on our genealogical research as these small matches have proved to be so unreliable that they are impossible to work with. The last time I analysed my matches at AncestryDNA and compared them with my parents' match lists I found that 54% of my matches in the 6-7 cM range did not match either of my parents and were therefore probably false positives. (1) Clearly if there is over a 50% chance that a match will be false we cannot reliably assign these matches to a common ancestor, even if we can identify one in our shared family trees. Even if the match is real, the chances are still very low that it will be a reflection of a recent genealogical relationship and it is far more likely to be the result of very distant sharing. (2)

I currently have over 32,000 matches at AncestryDNA which is far more than I can ever possibly cope with. However, if you really are desperate to go through your matches and check the 6 and 7 cm matches before they disappear you can use the filter under Shared DNA to set a custom cM range to identify these matches.

In other news AncestryDNA's corporate page has been updated to show that they have now tested 18 million people. AncestryDNA now have by far the largest genetic genealogy database in the world. 23andMe is the next largest with a database of 12 million people. MyHeritage have 4 million people in their database, while FamilyTreeDNA have tested over two million people. (3)

The lockdown seems to have encouraged a renewed interest in family history so we can also look forward to receiving many more matches in the months and years to come.

Update 4th August
The roll out of the update has been delayed and will now be rolled out in stages. You will find full details, including FAQs, when you log into your AncestryDNA account.



Ancestry is now displaying decimal points for all matches sharing under 10 cM. All matches sharing under 8 cM will be removed at the end of August. This includes matches in the 7.5 to 7.9 cM range which were previously rounded up to 8 cM.

Further reading
Footnotes
1. See my blog post Comparing parent and child matches at AncestryDNA from August 2017 for the full details of this analysis.
2. See the ISOGG Wiki page on identity by descent which includes a chart from a 2015 paper by Doug Speed and David Balding providing the distribution of different-sized segments by generation.
3. FamilyTreeDNA do not publish details of the size of their autosomal DNA database. The two million figure about the number of people tested is taken from the FAQs on their home page. In the section headed "Who is FamilyTreeDNA?" they say: "Over 2 million people have tested with FamilyTreeDNA, resulting in the most comprehensive DNA matching database in the industry." FTDNA used to publish daily updates on the number of Y-DNA and mtDNA records in the database on their "Why choose FamilyTreeDNA page?" However, the figures on this page have not been updated since July 2019. Martin McDowell did an analysis in February 2020 based on FTDNA kit numbers in which estimated that FTDNA's autosomal DNA database was approaching two million. See the blog post "How big is the FamilyTreeDNA database" on the Genetic Genealogy Ireland website.

Updates
This page was updated on 15 July 2019 to include a third footnote to clarify information about the size of the FamilyTreeDNA database. It was updated on 16 July to include a link to the updated AncestryDNA white paper and a further reading list. It was also updated to clarify that starred matches will not be retained. The page was updated on 17 July to include a link to blog posts from Blaine Bettinger and Leah Larkin. The page was updated on 19 July following the receipt of an e-mail from AncestryDNA which clarified that starred matches would be retained after all and that any matches you save will also be automatically saved on your match's account. Additional information was added to the number points 1 and 2 with additional information from Ancestry about the changes in the reporting of segments. A link to Judy Russell's blog post was added on 28 July.

Thursday, 9 April 2020

DNA ethnicity article in May issue of Who Do You Think You Are? Magazine

The May issue of Who Do You Think You Are? Magazine is out now. It includes a big feature article from me on DNA "ethnicity" estimates and what they can and can't tell you.

While you might not be able to get out to the shops to buy a copy you can order a digital copy online or take out a subscription through Mags Direct to have the magazine delivered through your letter box.

There's always lots of interesting content in the magazine on a wide variety of genealogy subjects. I have two further DNA feature articles scheduled for later this year too.

To find out more check out the WDYTYA Magazine website. You can see a sneak preview of the May issue here.

Updated 27th April 2020
An edited version of the article is now available online on the WDYTYA website.

Wednesday, 19 February 2020

30x whole genome sequencing from Nebula Genomics for $299


The cost of whole genome sequencing has been slowly coming down to an affordable level. Dante Laboratories had a special offer on their direct-to-consumer (DTC) whole genome service (WGS) in November 2018 when the test was priced at €169 (£150 or $199). They now offer a 30x whole genome test for €289 (reduced from €599). 30x refers to the coverage of the test  the number of reads at each position. 30x is now the standard coverage for medical purposes. Dante are based in Italy and initially focused on the European market but now sell their test globally. They had sequenced over 10,000 genomes by the end of 2019 and are currently processing 600 to 700 genomes per week.

DTC whole genome sequencing has also been offered for several years by Full Genomes Corporation though they no longer sell their tests in the European Union. FGC offer a range of tests at different coverage as well as a long-read whole genome test for $2900. A range of DTC WGS tests is also available at varying levels of coverage from the German company YSEQ. The British company SanoGenetics launched a DTC whole genome sequence test priced at £950 at the end of 2019 with an emphasis on data security. They hope to provide access to genetic counsellors, a doctor and good links to the UK's National Health Service but it is likely to be more than a year before they are in a position to deliver on this promise.

The market is now hotting up with the announcement that Nebula Genomics have launched a new 30x whole genome sequencing service for $299 (£231 or €277). The Nebula product will be available in 188 countries. Nebula are based in the US with offices in San Francisco and Boston. The sequencing is currently being done by BGI in Hong Kong. Nebula have partnered with FamilyTreeDNA to provide an analysis of the Y-chromosome and mitochondrial DNA sequences which are included with the service. The following information about the Y-DNA and mtDNA ancestry analysis is provided in Nebula's FAQs.
It is not clear how the transfer process will work but I presume that the sequences will be uploaded to FTDNA's BigY database and mitochondrial DNA database in order to receive genealogical matches. I suspect the promise of additional ancestry reporting will be in the form of an option to transfer a Family Finder-compatible file to FTDNA's autosomal DNA database.

On top of the cost of the testing it is necessary to pay a subscription for access to Nebula's reports which are updated on a weekly basis. You can choose a monthly, annual or lifetime subscription.


It doesn't seem to be possible to order a test without paying for a subscription so it appears that you would have to sign up for at least a single subscription for one month once you have received your results.

There is further information about Nebula Genomics in this article from OneZero.

Whole genome sequencing is not likely to be of interest for the average genealogist in the immediate future. To use a WGS test for genealogy we would need to have a WGS database so that we can be matched with our genetic cousins. No such database currently exists though I suspect it's only a matter of time before an enterprising company decides to take the initiative and set up a service. For now WGS is likely to be of most interest for genealogical purposes for the Y-chromosome data to see how the sequencing compares with other sequencing products such as FamilyTreeDNA's BigY test. WGS will also appeal to advanced genetic genealogists who like manipulating and playing with big data files. For example Louis Kessler, a genetic genealogist with a background in computer programming, has purchased a number of WGS tests and has had great fun analysing the files out of sheer scientific curiosity.

None of the major genetic genealogy companies currently offers a WGS test but I suspect it's only a matter of time.

Tuesday, 7 January 2020

The end of an era: goodbye to the Rootsweb mailing lists

It was announced today that the Rootsweb genealogy mailing lists will be discontinued and archived. Here is the e-mail I received from the Rootsweb Listowners list.
Beginning March 2nd, 2020 the Mailing Lists functionality on RootsWeb will be discontinued. Users will no longer be able to send outgoing emails or accept incoming emails. Additionally, administration tools will no longer be available to list administrators and mailing lists will be put into an archival state.

Administrators may save the emails in their list prior to March 2nd. After that, mailing list archives will remain available and searchable on RootsWeb.

As an alternative to RootsWeb Mailing Lists, Ancestry message boards are a great option to network with others in the genealogy community. Message boards are available for free with an Ancestry registered account.

Thank you for being part of the RootsWeb family and contributing to this community.

Sincerely,

The RootsWeb team
When I first started my family history research nearly 20 years ago I found that the regional Rootsweb mailing lists were an invaluable source of education and assistance, and I made many friends on these lists. Unfortunately the functionality of the lists has been greatly reduced for many years now. The lists were offline for a considerable time as a result of security issues and they were eventually transferred to a new host in July 2018. It was perhaps inevitable that with all these problems discussions would move elsewhere.

While mailing lists used to be the central focus of genealogical life, they are becoming used much less often and I find that most of my genealogy and DNA conversations now take place in the various Facebook groups and also on Twitter. I am the admin of a few surname lists on Rootsweb but no one has posted on these lists for many months. If a service is not supported it is inevitable that it will eventually disappear. The demise of the Rootsweb lists is not a big surprise, but it does feel like the end of an era.

The decline in the use of mailing lists was no doubt also a factor in the decision by Yahoo to shut down all the web hosting for their Yahoo Groups. Yahoo hosted many of the popular DNA lists as well as a number of genealogy lists. The Yahoo lists will continue to function as email lists but without any archiving facility. All the old conversations have been deleted. This is a salutary lesson that all websites need to be backed up and archived in order to ensure their preservation. I suspect a huge amount of knowledge and history has already been lost as many groups have disappeared without back ups being made.

No doubt some of the Rootsweb lists and Yahoo Groups will find a new home elsewhere. Some mailing lists have now moved over to IO Groups. I am one of the admins of the Haplogroup R1b-U016 list and we moved our U106 list from Yahoo to IO Groups. We have been very pleased with the service from IO Groups. There is a lot of additional functionality which we have found very useful. If you have been hosting a list on Rootsweb and are looking for a new platform then IO Groups would be a good alternative.

Facebook is not everyone's cup of tea but it is home to a vibrant genealogy community. Katherine Willson does a brilliant job of tracking and categorising all the genealogy and history groups on Facebook. Do check out her Genealogy on Facebook list. At the last update in May 2019 it included over 14,500 links.

If you are particularly interested in genetic genealogy check out the ISOGG Wiki list of genetic genealogy mailing lists and Facebook groups.

Facebook has over two billion users around the world so it's not going away any time soon – or at least not until the next big thing comes along and who knows what that might be?

Saturday, 4 January 2020

New lower pricing structure at FamilyTreeDNA

The FTDNA sale has now ended but the good news is that the prices haven’t gone back up to the old pricing levels and the new prices are now much lower. When ordering direct from the FTDNA home page there are now only five tests available:
  • Y-37 $119 (previously $169)
  • Y-111 $249 (previously $359)
  • BigY-700 $449 (previously $649)
  • mtDNA full sequence $159 (previously $199)
  • Family Finder autosomal DNA test $79 (no change)
The old price of the BigY-700 test included access to the raw data file (the BAM file). However, most people did not want the raw data file which meant that the price was artificially inflated for the benefit of the few. If you do want your BAM file you can now purchase it as an add on for $100.

Shipping costs $9.95 in the US and $12.95 to most international destinations.

The 25-marker test and the 67-marker test have now been discontinued. The 12-marker test is still available for $59 but can only be ordered through a project. You can access the project search menu here.

There used to be discounts available when ordering kits through projects but these discounts are no longer available. However, with the new lower prices I would hope that all of us with projects at FTDNA will see renewed interests in our projects in 2020.

In addition to the reduced prices for new tests there are also big reductions in the upgrade prices for Y-DNA tests. Dave Nicolson compiled a spreadsheet showing the new pricing which he shared in the Only FTDNA Project Administrators Group on Facebook. He has kindly given me permission to reproduce his chart below.


Discounts for members of the Guild of One-Name Studies
If you are a member of the Guild of One-Name Studies note that you can buy 37-marker tests and Family Finder tests at discounted prices from the Guild.

The Y-DNA test is currently £88 from the Guild. The cost from FTDNA is £91 at current exchange rates. Postage rates are calculated individually by the Guild and you would need to pay return postage for the kit to Texas but for most people, especially outside the US, it is likely to be cheaper to buy a 37-marker test from the Guild, and especially so if you can pick up a kit at one of the Guild events. 

You can also buy FTDNA Family Finder kits direct from the Guild for £40. This is a considerable saving on the current price of $79 from FTDNA which works out at £60 at current exchange rates. Postage would again be extra.

For further details see the page on the Guild website on DNA kits available from the Guild.