The following message is posted on behalf of Full Genomes Corporation:
Full Genomes Corporation (FGC) is announcing the official launch of a service to analyze BAM files from Family Tree DNA's Big Y product. The analysis is being launched at a price of $50 per kit. Recently, FGC had offered the Big Y analysis for a limited time, as a beta product, at no charge. FGC will continue to allow individuals to contribute their BAM files to the Full Genomes database without charge, so that their results may be used in kit cross-comparisons. The offering is designed to provide broader access to FGC's proprietary Y chromosome analysis services and to build FGC's database for purposes of kit comparisons.
The analysis will include the same reports as provided to customers of the Full Genomes sequencing product, with the exception of the mitochondrial DNA analysis, which will use the Yoruba reference sequence for Big Y kits. So, the analysis will consider Y-STRs and INDELs, in addition to Y-SNPs. To be clear, however, the results won't be able to achieve the same resolution as the Full Genomes sequencing product due to limitations with the underlying data from the Big Y test.
Interested individuals should first obtain access to their Big Y BAM file by contacting Family Tree DNA customer service. Those interested in ordering analysis can follow the instructions here to set up a Full Genomes account, make payment, and upload their BAM file; analysis will be performed in weekly batches. Those who are only interested in contributing their results to the Full Genomes comparison database may send the download link to fgcfilesharing@gmail.com, while also indicating their interest in donating their results and optionally providing a name (like FTDNA Kit Number) to associate with the results.
According to Dr. Greg Magoon, Y chromosome data analysis consultant for FGC, "I think the FGC analysis will address many of the needs that have been expressed by members of the genetic genealogy community who have been looking at Big Y results in recent weeks. In my view, the main strengths of the FGC analysis include its cross-kit comparisons and its SNP reliability classifications. We have put a lot of R&D into separating the wheat from the chaff to allow customers and researchers to quickly focus on the most reliable, phylogenetically-useful variants. I think the FGC analysis will help to significantly speed the interpretation of results and decrease the burden on busy genetic genealogists."
Separately, FGC is announcing a beta-stage referral program, which will provide customers with access to advanced analyses of their Full Genomes "next-gen" sequencing data. A Full Genomes customer who refers at least three other individuals to order the Full Genomes test will be entitled to a bleeding-edge, advanced analysis of their choosing. Potential analysis options include:
-Remapping of results to the newer, build 38 human genome reference sequence
-Remapping of results with a new and improved alignment algorithm/approach
-Y-STR analysis using a newer, larger STR database
-Phylogenetic analysis for portions of the Y tree
-Variant calling (SNPs and INDELs) for autosomal and X-chromosome data
Interested customers are advised to contact sales@fullgenomes.com to supply documentation of referrals and to discuss custom analysis options.
Dr Magoon said: "From a research perspective, I'm very excited about the potential for the referral program to push the boundaries of Y-chromosome analysis. We've already been able to work with customers on a case-by-case basis to do some very interesting customized analyses with the Full Genomes results, including the identification of large duplications and deletions through copy-number variation (CNV) analysis."
Speaking about FGC's next-gen sequencing test, CEO Justin Loe said: "The FGC Y chromosome product is the most comprehensive in the market today but it is also, as we recognize, expensive for many potential customers. Over the near term, we expect to be able to make this product more affordable. Additionally, with the advent of new sequencing technologies other products will also be offered."
In fact, in honor of DNA Day, Full Genomes is currently offering a limited-time discount of 20% off the normal price for their comprehensive Y chromosome sequencing test (using coupon code "FGCDNA").
Dr Magoon commented: "I think what we're seeing across genetic genealogy is that companies are finding a niche with products focused on particular areas. For example, 23andMe has been a pioneer in autosomal DNA. We have seen that BritainsDNA has been making great advances in developing innovative chip-based tests for Y chromosome (and other) markers. Family Tree DNA has established a leadership role in Y-STRs and in full mitochondrial DNA sequencing. YSEQ, with Dr. Thomas Krahn, is the world leader in developing Y chromosome marker tests using Sanger sequencing. I am very excited to see FGC working hard to establish a similar role here in the field of "next-gen" Y chromosome sequencing."
Related blog posts
- A confusion of SNPs
The day-to-day activities of the Cruwys/Cruse one-name study with occasional diversions into other topics of interest such as DNA testing and personal genomics
Saturday, 26 April 2014
Friday, 25 April 2014
The new 2014 Y-DNA haplotree has arrived!
Today saw the launch of Family Tree DNA's new 2014 Y-DNA haplotree which has been created in partnership with National Geographic's Genographic Project. If you've tested with Family Tree DNA you will find the new tree by going to your personal page and clicking on "haplotree and SNPs". Below is a screenshot of the upper portion of the tree for haplogroup R:
The tree is too large to fit into a single screenshot. Here is the relevant portion of the tree for my dad who is R-Z12, a sub-branch of U106.
Note that this is very much an interim tree. It is based on SNPs tested with the Genographic Project's Geno 2.0 chip, and the cut off date for inclusion of SNPs is November 2013. The new tree does not include all the thousands of new SNPs identified from testing with Big Y, Full Genomes and Chromo 2. The tree will eventually be much more comprehensive but FTDNA are being careful about the data they use from other sources and are insisting that SNPs are only added from published data and raw data that they have personally verified rather than from interpreted data. They have promised that at least one update will be released this year. The FTDNA Learning Center will eventually be updated with information about the new haplotree. If you have questions about a particular SNP that is in the wrong place on the tree or if you spot any other errors you should send an e-mail to the FTDNA help desk with Y-Tree in the subject line.
FTDNA are now recommending SNPs for people to test. I've only had a chance to look briefly at the SNP recommendations for a few project members. It is apparent that in some cases the SNPs that are recommended for testing are not appropriate. SNPs are only recommended if they pass certain percentage thresholds and there might well be a more appropriate downstream SNP that would be more suitable. If you are interested in ordering single SNP testing, make sure you join the appropriate Y-DNA haplogroup project and seek advice from the project administrators. If not, you could end up wasting money ordering unnecessary SNPs.
The following information has been provided by Family Tree DNA.
FAST FACTS
• Created in partnership with National Geographic’s Genographic Project
• Used GenoChip containing ~10,000 previously unclassified Y-SNPs
• Some of those SNPs came from Walk Through the Y and the 1000 Genome Project
• Used first 50,000 high-quality male Geno 2.0 samples
• Verified positions from 2010 YCC by Sanger sequencing additional anonymous samples
• Filled in data on rare haplogroups using later Geno 2.0 samples
Statistics
• Expanded from approximately 400 to over 1200 terminal branches
• Increased from around 850 SNPs to over 6200 SNPs
• Cut-off date for inclusion for most haplogroups was November 2013
Total number of SNPs broken down by haplogroup:
A 406
B 69
BT 8
C 371
CT 64
D 208
DE 16
E 1028
F 90
G 401
H 18
I 455
IJ 29
IJK 2
J 707
K 11
K(xLT) 1
L 129
LT 12
M 17
N 168
NO 16
O 936
P 81
Q 198
R 724
S 5
T 148
myFTDNA Interface
• Existing customers receive free update to predictions and confirmed branches based on existing SNP test results.
• Haplogroup badge updated if new terminal branch is available
• Updated haplotree design displays new SNPs and branches for your haplogroup
• Branch names now listed in shorthand using terminal SNPs
• For SNPs with more than one name, in most cases the original name for SNP was used, with synonymous SNPs listed when you click "More…"
• No longer using SNP names with .1, .2, .3 suffixes. Back-end programming will place SNP in correct haplogroup using available data.
• SNPs recommended for additional testing are pre-populated in the cart for your convenience. Just click to remove those you don’t want to test.
• SNPs recommended for additional testing are based on 37-marker haplogroup origins data where possible, 25- or 12-marker data where 37 markers weren't available.
• Once you've tested additional SNPs, that information will be used to automatically recommend additional SNPs for you if they’re available.
• If you remove those prepopulated SNPs from the cart, but want to re-add them, just refresh your page or close the page and return.
• Only one SNP per branch can be ordered at one time – synonymous SNPs can possibly [be] ordered from the Advanced Orders section on the Upgrade Order page.
• Tests taken have moved to the bottom of the haplogroup page.
Coming attractions
• Group Administrator Pages will have longhand removed.
• At least one update to the tree to be released this year.
• Update will include: data from Big Y, relevant publications, other companies' tests from raw data.
• We'll set up a system for those who have tested with other big data companies to contribute their raw data file to future versions of the tree.
• We're committed to releasing at least one update per year.
• The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks. At that time, all Geno 2.0 participants’ results will be updated accordingly and accessible via the Genographic Project website.
BACKGROUND
Family Tree DNA created the 2014 Y-DNA Haplotree in partnership with the National Geographic Genographic Project using the proprietary GenoChip. Launched publicly in late 2012, the chip tests approximately 10,000 Y-DNA SNPs that had not, at the time, been phylogenetically classified.
The team used the first 50,000 male samples with the highest quality results to determine SNP positions. Using only tests with the highest possible “call rate” meant more available data, since those samples had the highest percentage of SNPs that produced results, or “calls.”
In some cases, SNPs that were on the 2010 Y-DNA Haplotree didn’t work well on the GenoChip, so the team used Sanger sequencing on anonymous samples to test those SNPs and to confirm ambiguous locations.
For example, if it wasn’t clear if a clade was a brother (parallel) clade, or a downstream clade, they tested for it.
The scope of the project did not include going farther than SNPs currently on the GenoChip in order to base the tree on the most data available at the time, with the cutoff for inclusion being about November of 2013.
Where data were clearly missing or underrepresented, the team curated additional data from the chip where it was available in later samples. For example, there were very few Haplogroup M samples in the original dataset of 50,000, so to ensure coverage, the team went through eligible Geno 2.0 samples submitted after November, 2013, to pull additional Haplogroup M data. That additional research was not necessary on, for example, the robust Haplogroup R dataset, for which they had a significant number of samples.
Family Tree DNA, again in partnership with the Genographic Project, is committed to releasing at least one update to the tree this year. The next iteration will be more comprehensive, including data from external sources such as known Sanger data, Big Y testing, and publications. If the team gets direct access to raw data from other large companies’ tests, then that information will be included as well. We are also committed to at least one update per year in the future.
Known SNPs will not intentionally be renamed. Their original names will be used since they represent the original discoverers of the SNP. If there are two names, one will be chosen to be displayed and the additional name will be available in the additional data, but the team is taking care not to make synonymous SNPs seems as if they are two separate SNPs. Some examples of that may exist initially, but as more SNPs are vetted, and as the team learns more, those examples will be removed.
In addition, positions or markers within STRs, as they are discovered, or large insertion/deletion events inside homopolymers, potentially may also be curated from additional data because the event cannot accurately be proven. A homopolymer is a sequence of identical bases, such as AAAAAAAAA or TTTTTTTTT. In such cases it’s impossible to tell which of the bases the insertion is, or if/where one was deleted. With technology such as Next Generation Sequencing, trying to get SNPs in regions such as STRs or homopolymers doesn’t make sense because we’re discovering non-ambiguous SNPs that define the same branches, so we can use the non-ambiguous SNPs instead. Some SNPs from the 2010 tree have been intentionally removed. In some cases, those were SNPs for which the team never saw a positive result, so while it may be a legitimate SNP, even haplogroup defining, it was outside of the current scope of the tree. In other cases, the SNP was found in so many locations that it could cause the orientation of the tree to be drawn in more than one way. If the SNP could legitimately be positioned in more than one haplogroup, the team deemed that SNP to not be haplogroup defining, but rather a high polymorphic location.
To that end, SNPs no longer have .1, .2, or .3 designations. For example, J-L147.1 is simply J-L147, and I-147.2 is simply I-147. Those SNPs are positioned in the same place, but back-end programming will assign the appropriate haplogroup using other available information such as additional SNPs tested or haplogroup origins listed. If other SNPs have been tested and can unambiguously prove the location of the multi-locus SNP for the sample, then that data is used. If not, matching haplogroup origin information is used.
We will also move to shorthand haplogroup designations exclusively. Since we’re committing to at least one iteration of the tree per year, using longhand that could change with each update would be too confusing. For example, Haplogroup O used to have three branches: O1, O2, and O3. A SNP was discovered that combined O1 and O2, so they became O1a and O1b.
There are over 1200 branches on the 2014 Y Haplogroup tree, as compared to about 400 on the 2010 tree. Those branches contain over 6200 SNPs, so we’ve chosen to display select SNPs as “active” with an adjacent “More” button to show the synonymous SNPs if you choose.
The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks. At that time, all Geno 2.0 participants’ results will be updated accordingly and will be accessible via the Genographic Project website.
QUOTES
Elliot Greenspan has provided the following quotes in conversation with Janine Cloud, Family Tree DNA's GAP Liaison and Events Co-ordinator:
"I want it to be the most accurate tree it can be, but I also want it to be interesting. That's the key. Historical relevance is what we're to discover. Anthropological relevance. It's not just who has the largest tree, it's who can make the most sense out of what you have [that] is important."
"This year we're committing to launching another tree. This tree will be more comprehensive, utilizing data from external sources: known Sanger data, as well as data such as Big Y, and if we have direct access to the raw data to make the proof (from large companies, such as the Chromo2) or a publication, or something of that nature. That is our intention that it be added into the data."
"We’re definitely committed to update at least once per year. Our intention is to use data from other sources, as well as any SNPs we can, but it must be well-vetted. NGS and SNP technology inherently has errors. You must curate for those errors otherwise you’re just putting slop out to customers. There are some SNPs that may bind to the X chromosome that you didn’t know. There are some low coverages that you didn’t know."
"With technology such as this [next-generation sequencing] you're able to overcome the urge to test only what you’re likely to be positive for, and instead use the shotgun method and test everything. This allows us to make the discovery that SNPs are not nearly as stable as we thought, and they have a larger potential use in that sense."
"Not only does the raw data need to be vetted but it needs to make sense. Using Geno 2.0, I only accepted samples that had the highest call rate, not just because it was the best quality but because it was the most data. I don't want to be looking at data where I'm missing potential information A, or I may become confused by potential information B. That is something that will bog us down. When you’re looking at large data sets, I’d much rather throw out 20% of them because they’re going to take 90% of the time than to do my best to get one extra SNP on the tree or one extra branch modified, that is not worth all of our time and effort. What is, is figuring out what the broader scope of people are, because that is how you break down origins. Figuring one single branch for one group of three people is not truly interesting until it's 50 people, because 50 people is a population. Three people may be a family unit. You have to have enough people to determine relevance. That's why using large datasets and using complete datasets are very, very important."
Update 27 April 2014
A recording of the Family Tree DNA webinar presented by Elise Friedman on the launch of the new 2014 haplotree is now available online and can be accessed here (free registration required).
Related blog posts
- A confusion of SNPs
The tree is too large to fit into a single screenshot. Here is the relevant portion of the tree for my dad who is R-Z12, a sub-branch of U106.
Note that this is very much an interim tree. It is based on SNPs tested with the Genographic Project's Geno 2.0 chip, and the cut off date for inclusion of SNPs is November 2013. The new tree does not include all the thousands of new SNPs identified from testing with Big Y, Full Genomes and Chromo 2. The tree will eventually be much more comprehensive but FTDNA are being careful about the data they use from other sources and are insisting that SNPs are only added from published data and raw data that they have personally verified rather than from interpreted data. They have promised that at least one update will be released this year. The FTDNA Learning Center will eventually be updated with information about the new haplotree. If you have questions about a particular SNP that is in the wrong place on the tree or if you spot any other errors you should send an e-mail to the FTDNA help desk with Y-Tree in the subject line.
FTDNA are now recommending SNPs for people to test. I've only had a chance to look briefly at the SNP recommendations for a few project members. It is apparent that in some cases the SNPs that are recommended for testing are not appropriate. SNPs are only recommended if they pass certain percentage thresholds and there might well be a more appropriate downstream SNP that would be more suitable. If you are interested in ordering single SNP testing, make sure you join the appropriate Y-DNA haplogroup project and seek advice from the project administrators. If not, you could end up wasting money ordering unnecessary SNPs.
The following information has been provided by Family Tree DNA.
FAST FACTS
• Created in partnership with National Geographic’s Genographic Project
• Used GenoChip containing ~10,000 previously unclassified Y-SNPs
• Some of those SNPs came from Walk Through the Y and the 1000 Genome Project
• Used first 50,000 high-quality male Geno 2.0 samples
• Verified positions from 2010 YCC by Sanger sequencing additional anonymous samples
• Filled in data on rare haplogroups using later Geno 2.0 samples
Statistics
• Expanded from approximately 400 to over 1200 terminal branches
• Increased from around 850 SNPs to over 6200 SNPs
• Cut-off date for inclusion for most haplogroups was November 2013
Total number of SNPs broken down by haplogroup:
A 406
B 69
BT 8
C 371
CT 64
D 208
DE 16
E 1028
F 90
G 401
H 18
I 455
IJ 29
IJK 2
J 707
K 11
K(xLT) 1
L 129
LT 12
M 17
N 168
NO 16
O 936
P 81
Q 198
R 724
S 5
T 148
myFTDNA Interface
• Existing customers receive free update to predictions and confirmed branches based on existing SNP test results.
• Haplogroup badge updated if new terminal branch is available
• Updated haplotree design displays new SNPs and branches for your haplogroup
• Branch names now listed in shorthand using terminal SNPs
• For SNPs with more than one name, in most cases the original name for SNP was used, with synonymous SNPs listed when you click "More…"
• No longer using SNP names with .1, .2, .3 suffixes. Back-end programming will place SNP in correct haplogroup using available data.
• SNPs recommended for additional testing are pre-populated in the cart for your convenience. Just click to remove those you don’t want to test.
• SNPs recommended for additional testing are based on 37-marker haplogroup origins data where possible, 25- or 12-marker data where 37 markers weren't available.
• Once you've tested additional SNPs, that information will be used to automatically recommend additional SNPs for you if they’re available.
• If you remove those prepopulated SNPs from the cart, but want to re-add them, just refresh your page or close the page and return.
• Only one SNP per branch can be ordered at one time – synonymous SNPs can possibly [be] ordered from the Advanced Orders section on the Upgrade Order page.
• Tests taken have moved to the bottom of the haplogroup page.
Coming attractions
• Group Administrator Pages will have longhand removed.
• At least one update to the tree to be released this year.
• Update will include: data from Big Y, relevant publications, other companies' tests from raw data.
• We'll set up a system for those who have tested with other big data companies to contribute their raw data file to future versions of the tree.
• We're committed to releasing at least one update per year.
• The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks. At that time, all Geno 2.0 participants’ results will be updated accordingly and accessible via the Genographic Project website.
BACKGROUND
Family Tree DNA created the 2014 Y-DNA Haplotree in partnership with the National Geographic Genographic Project using the proprietary GenoChip. Launched publicly in late 2012, the chip tests approximately 10,000 Y-DNA SNPs that had not, at the time, been phylogenetically classified.
The team used the first 50,000 male samples with the highest quality results to determine SNP positions. Using only tests with the highest possible “call rate” meant more available data, since those samples had the highest percentage of SNPs that produced results, or “calls.”
In some cases, SNPs that were on the 2010 Y-DNA Haplotree didn’t work well on the GenoChip, so the team used Sanger sequencing on anonymous samples to test those SNPs and to confirm ambiguous locations.
For example, if it wasn’t clear if a clade was a brother (parallel) clade, or a downstream clade, they tested for it.
The scope of the project did not include going farther than SNPs currently on the GenoChip in order to base the tree on the most data available at the time, with the cutoff for inclusion being about November of 2013.
Where data were clearly missing or underrepresented, the team curated additional data from the chip where it was available in later samples. For example, there were very few Haplogroup M samples in the original dataset of 50,000, so to ensure coverage, the team went through eligible Geno 2.0 samples submitted after November, 2013, to pull additional Haplogroup M data. That additional research was not necessary on, for example, the robust Haplogroup R dataset, for which they had a significant number of samples.
Family Tree DNA, again in partnership with the Genographic Project, is committed to releasing at least one update to the tree this year. The next iteration will be more comprehensive, including data from external sources such as known Sanger data, Big Y testing, and publications. If the team gets direct access to raw data from other large companies’ tests, then that information will be included as well. We are also committed to at least one update per year in the future.
Known SNPs will not intentionally be renamed. Their original names will be used since they represent the original discoverers of the SNP. If there are two names, one will be chosen to be displayed and the additional name will be available in the additional data, but the team is taking care not to make synonymous SNPs seems as if they are two separate SNPs. Some examples of that may exist initially, but as more SNPs are vetted, and as the team learns more, those examples will be removed.
In addition, positions or markers within STRs, as they are discovered, or large insertion/deletion events inside homopolymers, potentially may also be curated from additional data because the event cannot accurately be proven. A homopolymer is a sequence of identical bases, such as AAAAAAAAA or TTTTTTTTT. In such cases it’s impossible to tell which of the bases the insertion is, or if/where one was deleted. With technology such as Next Generation Sequencing, trying to get SNPs in regions such as STRs or homopolymers doesn’t make sense because we’re discovering non-ambiguous SNPs that define the same branches, so we can use the non-ambiguous SNPs instead. Some SNPs from the 2010 tree have been intentionally removed. In some cases, those were SNPs for which the team never saw a positive result, so while it may be a legitimate SNP, even haplogroup defining, it was outside of the current scope of the tree. In other cases, the SNP was found in so many locations that it could cause the orientation of the tree to be drawn in more than one way. If the SNP could legitimately be positioned in more than one haplogroup, the team deemed that SNP to not be haplogroup defining, but rather a high polymorphic location.
To that end, SNPs no longer have .1, .2, or .3 designations. For example, J-L147.1 is simply J-L147, and I-147.2 is simply I-147. Those SNPs are positioned in the same place, but back-end programming will assign the appropriate haplogroup using other available information such as additional SNPs tested or haplogroup origins listed. If other SNPs have been tested and can unambiguously prove the location of the multi-locus SNP for the sample, then that data is used. If not, matching haplogroup origin information is used.
We will also move to shorthand haplogroup designations exclusively. Since we’re committing to at least one iteration of the tree per year, using longhand that could change with each update would be too confusing. For example, Haplogroup O used to have three branches: O1, O2, and O3. A SNP was discovered that combined O1 and O2, so they became O1a and O1b.
There are over 1200 branches on the 2014 Y Haplogroup tree, as compared to about 400 on the 2010 tree. Those branches contain over 6200 SNPs, so we’ve chosen to display select SNPs as “active” with an adjacent “More” button to show the synonymous SNPs if you choose.
The Genographic Project is currently integrating the new data into their system and will announce on their website when the process is complete in the coming weeks. At that time, all Geno 2.0 participants’ results will be updated accordingly and will be accessible via the Genographic Project website.
Elliot Greenspan has provided the following quotes in conversation with Janine Cloud, Family Tree DNA's GAP Liaison and Events Co-ordinator:
"I want it to be the most accurate tree it can be, but I also want it to be interesting. That's the key. Historical relevance is what we're to discover. Anthropological relevance. It's not just who has the largest tree, it's who can make the most sense out of what you have [that] is important."
"This year we're committing to launching another tree. This tree will be more comprehensive, utilizing data from external sources: known Sanger data, as well as data such as Big Y, and if we have direct access to the raw data to make the proof (from large companies, such as the Chromo2) or a publication, or something of that nature. That is our intention that it be added into the data."
"We’re definitely committed to update at least once per year. Our intention is to use data from other sources, as well as any SNPs we can, but it must be well-vetted. NGS and SNP technology inherently has errors. You must curate for those errors otherwise you’re just putting slop out to customers. There are some SNPs that may bind to the X chromosome that you didn’t know. There are some low coverages that you didn’t know."
"With technology such as this [next-generation sequencing] you're able to overcome the urge to test only what you’re likely to be positive for, and instead use the shotgun method and test everything. This allows us to make the discovery that SNPs are not nearly as stable as we thought, and they have a larger potential use in that sense."
"Not only does the raw data need to be vetted but it needs to make sense. Using Geno 2.0, I only accepted samples that had the highest call rate, not just because it was the best quality but because it was the most data. I don't want to be looking at data where I'm missing potential information A, or I may become confused by potential information B. That is something that will bog us down. When you’re looking at large data sets, I’d much rather throw out 20% of them because they’re going to take 90% of the time than to do my best to get one extra SNP on the tree or one extra branch modified, that is not worth all of our time and effort. What is, is figuring out what the broader scope of people are, because that is how you break down origins. Figuring one single branch for one group of three people is not truly interesting until it's 50 people, because 50 people is a population. Three people may be a family unit. You have to have enough people to determine relevance. That's why using large datasets and using complete datasets are very, very important."
Update 27 April 2014
A recording of the Family Tree DNA webinar presented by Elise Friedman on the launch of the new 2014 haplotree is now available online and can be accessed here (free registration required).
Related blog posts
- A confusion of SNPs
Wednesday, 23 April 2014
The 2014 Y-DNA haplotree and special offers for DNA day
Family Tree DNA group administrators have received notification that the long awaited Y-DNA 2014 haplotree is to be launched on Friday 25th April to coincide with DNA Day and also something known in America as National Arbor Day which I'd never previously heard of but is rather aptly related to the planting and nurturing of trees, albeit real ones rather than those constructed from DNA. Starting on Friday the 37-marker Y-DNA test will also be on sale for a limited time. Here are the details:
National DNA Day, celebrated on April 25, commemorates the completion of the Human Genome Project and the discovery of DNA's double helix on April 25, 2003.
Since 1970, the U.S. has observed National Arbor Day, dedicated to the planting and nurturing of trees, on the last Friday in April.
This year National Arbor Day falls on National DNA Day, so what better opportunity for Family Tree DNA to release the long-awaited 2014 Y-DNA Haplotree!
We wanted you, the group administrators who have done so much to contribute to the success of the company, to know before we release the news to the entire Y database and the genetic genealogy community.
In addition to expanding the tree from 400 to 1000 terminal branches, the Haplotree page will have an updated, fresh design.
Our engineering team will begin to push the code that will update the database prior to the official release of the tree, so you'll see some changes in terminal SNPs and haplogroups for those who have done additional testing.
To help with the transition, our Webinar Coordinator, Elise Friedman will host a live webinar on DNA Day for a demonstration of the new tree and more details about this landmark update on Friday, April 25, 2014 @ 12pm Central (5pm UTC).
To register, click here: http://bit.ly/1dGbbbx
A recording of this webinar will be posted to the Webinars page of our Learning Center within 24-48 hours after the live event: https://www.familytreedna.com/learn/ftdna/webinars
***********************************************************************
And because we know you're going to ask...we will have a DNA Day sale that suits the occasion!
Y-DNA SNPs will be 20% off from April 25 - 29. In addition, the Y-DNA 37 test will be 20% off the retail price.
The sale officially begins at 12.01am Houston time on 25th April and ends at 1.59 pm on 29th April. If you are ordering a Y-DNA test make sure you order through a surname project or a geographical project to benefit from the additional project discount. As always I would be very happy to welcome new members to my Cruwys/Cruse/Cruise/Crew(es) DNA Project and my Devon DNA Project.
Thomas Krahn's company YSEQ has also announced a price reduction. Single SNPs are reduced to $25 with immediate effect through until Father's Day on 15th June 2014. For further details about YSEQ see my previous blog post YSEQ.net - a new company offering a single SNP testing service.
Family Tree DNA last updated their Y-DNA haplotree back in 2010. There have been a huge number of changes since then so the new tree will be most welcome. However, with the tsunami of new SNPs now being identified from the Big Y, Full Genomes and Chromo 2 tests, the 2014 tree is already going to be very out of date as soon as it is published. To understand the problem read my previous blog post on a confusion of SNPs. I presume the new tree will also see the full implementation of the shorthand naming system. For example, the format R-Z12 will be used instead of the unwieldy longhand version which, according to the current ISOGG Y-SNP tree, is R1b1a2a1a1c2b2a1a1a1. I would also hope that the new tree will have the facility built in to allow more frequent updates in the future. Let's wait and see what Friday brings. Here's hoping for a smooth transition.
Sunday, 20 April 2014
Guild of One Name Studies 2014 Conference in Ashford, Kent
Last weekend I spent a very enjoyable couple of days in Ashford in Kent at the Guild of One-Name Studies' Conference. I do not like driving on motorways at the best of times, and especially not the M25, so I decided to travel by train, which gave me the chance to see for the first time the interior of the magnificently restored St Pancras Station, the terminus for the Eurostar services to Europe, and where I picked up my connection for Ashford International Station. The Ashford train is on the new high speed line to the Kent coast with shiny new carriages that are so posh that I thought I'd sat down in first class by mistake! At Ashford station I met up with fellow Guild members Jennifer Tudbury and Denise Bright who were sharing the lift with me from the station to the hotel. Cliff Kemball, who organised the conference with Bob Cumberbatch, had somehow managed to find the time in his busy schedule to act as our chauffeur. We got to the hotel soon after 5.00 pm. After checking in and unpacking there was time for a quick cup of tea and an impromptu Berkshire meeting with Gillian Stevens, Chad Hanna and Ivan Dickason, before heading off to the buffet supper.
After the meal there was an option to attend a presentation by Peter Hagger on the proposed changes that are planned for the Guild's Constitution. Although a somewhat dry subject Peter managed to make the review sound very interesting and gave us much food for thought. We were also given the chance to provide feedback on the proposed changes. Peter's constitutional review was followed by a fascinating talk by local author Bob Ogley on life in nineteenth-century Kent which included anecdotes about some of the famous names associated with the county such as Charles Darwin and Charles Dickens. Many of us then retreated to the bar for a few drinks and a chat.
We had to be up early on Saturday as the programme started at 9.00 am. Derek Palgrave, the President of the Guild, opened the meeting, and Kirsty Gray, the Guild Chairman for the preceding year, provided a review of Guild activities. For the third year running the conference proceedings were livestreamed. The recordings will eventually be spliced and diced and uploaded to the Guild's YouTube channel, provided that the speakers have given permission. I therefore won't go into too much detail about the individual talks but would encourage you to watch the recordings. Until the individual recordings have been uploaded you can watch the proceedings from Day 1 here and the proceedings from Day 2 here. I took my camera with me to the conference but somehow did not manage to take any photographs. I was at the back of the room and not in a good position to photograph the speakers. However, Peter Hagger has very kindly shared his photographs with me and given me permission to publish some of them here. Further photos will appear in the next issue of JOONS - the Journal of One-Name Studies.
Dick Eastman was the keynote speaker for the conference, and he was the first speaker on Saturday morning. He shared with us his vision of the future of genealogy which included a strong emphasis on the role of DNA testing, particularly for medical purposes.
Having not got to bed until after 2.00 am the night before I
was relieved that we had a late start to the Sunday sessions! There was some
confusion over the start times with two
competing timetables but it all seemed to work out in the end. I went along to the
FamilySearch breakout session hosted by Paul Smart. He provided us with a very useful overview of all the different FamilySearch features. There are now 4.43 billion names in FamilySearch. There are over 100,000 indexers but there are still many more records waiting to be indexed and more indexers are always needed. FamilySearch has a little-known labs feature where they try
out new services. One of my favourite FamilySearch features is the wonderful England 1851 jurisdictions map. This map is continuing to be developed and is in the process of being expanded to include Welsh parishes. FamilySearch now provide the facility to export search results in a spreadsheet but you
must be signed into your FamilySearch account first before you can do so.
Bob Cumberbatch gave an excellent talk on his top ten free tools for a one-name study. I am already using some of the tools that he recommends but he mentioned some other tools which I have not yet had a chance to explore and which I now hope to find time to investigate. One of the tools he recommended is Evernote which has been recommended to me by a number of other people too. Evernote has a particularly useful OCR (optical character recognition) facility which is very handy for converting digital newspaper images into text files. A similar facility is offered by Google Docs (now part of Google Drive) though file sizes are limited to 2 megabytes. Google Fusion Tables can be used to generate heat maps, and is another tool I hope to explore. Outwit Hub is a scraper which can be used to extract records from a database in an orderly fashion. Jo Tillin has written an excellent blog post on how she uses Outwit hub in her one-name study, and Tony Timmins has also written about Outwit Hub on on his blog. Bob has kindly made his slides available online and they can be downloaded from this link.
There was just time for afternoon tea and a final chat with a few more friends before it was time to depart and make our way home. Lifts to the station were kindly arranged for those of us travelling by train.
Further reading
- Christine Hancock's report from the Conference
- Dick Eastman's account of his weekend at the Conference
© 2014 Debbie Kennett
After the meal there was an option to attend a presentation by Peter Hagger on the proposed changes that are planned for the Guild's Constitution. Although a somewhat dry subject Peter managed to make the review sound very interesting and gave us much food for thought. We were also given the chance to provide feedback on the proposed changes. Peter's constitutional review was followed by a fascinating talk by local author Bob Ogley on life in nineteenth-century Kent which included anecdotes about some of the famous names associated with the county such as Charles Darwin and Charles Dickens. Many of us then retreated to the bar for a few drinks and a chat.
We had to be up early on Saturday as the programme started at 9.00 am. Derek Palgrave, the President of the Guild, opened the meeting, and Kirsty Gray, the Guild Chairman for the preceding year, provided a review of Guild activities. For the third year running the conference proceedings were livestreamed. The recordings will eventually be spliced and diced and uploaded to the Guild's YouTube channel, provided that the speakers have given permission. I therefore won't go into too much detail about the individual talks but would encourage you to watch the recordings. Until the individual recordings have been uploaded you can watch the proceedings from Day 1 here and the proceedings from Day 2 here. I took my camera with me to the conference but somehow did not manage to take any photographs. I was at the back of the room and not in a good position to photograph the speakers. However, Peter Hagger has very kindly shared his photographs with me and given me permission to publish some of them here. Further photos will appear in the next issue of JOONS - the Journal of One-Name Studies.
The Guild President Derek Palgrave opens the 2014 Conference. Photograph by Peter Hagger. |
Paul Cullen from the Family Names of the UK (FanUK) project, sporting a colourful Mohican haircut, was the next speaker talking on the subject of the Kentish surnames in the FanUK database. His talk was the highlight of the conference for me. I wrote previously about FanUK after attending my first Guild conference back in 2011. The project is attempting to provide a comprehensive database of the family surnames of the UK, and will look at their origins, history and geographical distribution. There are currently 45,281 entries in the FanUK database. Of these, 19,524 are main entries and 27,778 are variant spellings. (The maths doesn't add up because some main entries are also variant spellings of other surnames and vice versa.) For British surnames Reaney and Wilson's sets of early bearers were augmented with references from many different sources such as the fourteenth-century poll taxes, the patent rolls, the feet of fines and the International Genealogical Index. There are 5,308 Irish entries. Woulfe and MacLysaght were
corrected and augmented with early name bearers from the Annals of Ulster, the
Tudor Fiants, Flaxgrowers and other sources. There are 1,074 Scottish Gaelic
names, and 3,650 non-Gaelic Scots names. Finally there are 3,781 recent immigrant surnames (for example, Aziz, Mehmet, Patel and Wong). The project officially ended on the last day of March and is currently 96%
complete (there are a few stragglers for the letter W). The database will be published by Oxford University Press and will be
available in book form and also as an online database. The copy-editing and
production process will take two years, so we are looking at publication some
time in 2016. There is further information on the Arts and Humanities Research Council's website. Funding has now been received to continue the research for an additional two years and nine months. The second stage of the project, known as FanUK 2, will allow the researchers to study an additional 15,000 surnames which have
over 20 name bearers (the original cut-off point was 100 name bearers). Paul then took us on a tour of some of the Kentish surnames that he has researched for FanUK. Maps generated from Steve Archer's excellent Surname Atlas CD featured very prominently in the presentation.
After lunch there was a panel session on “How I run my
one-name study”. Through the wonders of modern technology Tessa Keough joined
us from the West Coast of America having bravely got up at an unearthly hour of
the morning to contribute to the session. The technology didn't work out quite as planned as we could only hear Tessa and see her slides but we couldn't see her on the video link, but it was nevertheless very exciting that she was able to participate in this way. Paul Howes discussed the collaborative approach adopted by the Howes/House one-name study, and Colin Spencer told us
about his Lefever one-name study.
After tea there were breakout sessions provided by the
three major genealogy companies, Findmypast, Ancestry and MyHeritage. I went
along to the Ancestry session presented by Miriam Silverman which I found very
useful. She explained that the much-loved Ancestry Old Search cannot be
restored because the underlying code is broken and it can’t cope with the sheer
volume of new records that are being added. While we might lament the
simplicity of the Old Search it does seem that it is possible to achieve the
same results using New Search but sometimes workarounds are necessary. As an
aside, note that if you wish to simulate the Old Search experience you can
adjust your site preferences by following the instructions here.
There were a number of comments from Guild members that it
often takes several extra clicks to do something in New Search. The place name
search is probably the most frustrating feature with that infuriating dropdown
menu where you are presented with a long list of places that you have no
interest in whatsoever, and you have to scroll through to find the one you are
interested in. Ancestry recognise the problem and it is hoped to improve the place search but it does
not seem to be an immediate priority. Often it is easier to type the place name
into the keyword search field. Another complication is that archives sometimes
have different names for a parish. This is a particular problem in London.
Ancestry will always use both names for indexing purposes.
It is important to understand the record collections so that
you can learn how to use them. Miriam cited the example of the British Phone Book collection. It is not possible to search the phone book database by surname alone
because of a contractual obligation, and you have to enter both name and place.
However, one very handy hint that she gave us is that it is possible to return
a list of all surnames in the phone books by doing a generic search.
The partner pages are another useful feature that Miriam brought to our attention. These pages are very helpful if you want to see which records have
been digitised and indexed from a particular repository. The example that
Miriam gave us was the following link which allows you to see all the records from
the London Metropolitan Archives: www.ancestry.co.uk/london. I presume other partner pages must exist but so far I've not been able to find any.
After the breakout sessions we were gathered together in the foyer for
the announcement of the new Committee and postholders. This announcement is
normally made at the start of the afternoon session, but this year the
deliberations seem to have taken much longer. The big surprise was
that Corrinne Goodenough is taking over as Chairman from Kirsty Gray.
Corrinne Goodenough, the new Chairman of the Guild of One-Name Studies. Photograph by Peter Hagger. |
In the evening there was a banquet which provided a good
chance for everyone to get together and have a chat. Fortunately this year the band were in a different room so those who wanted to talk could stay behind in the banquet room while others enjoyed themselves on the dance floor.
Jackie Depelle, Bob Cumberbatch and Pam Smith taking a twirl on the dance floor. Photograph by Peter Hagger. |
Jayne Shrimpton, the keynote speaker for Sunday, was unfortunately unable to attend at the last minute because of a family
crisis. As a result the schedule was juggled around. Bob Cumberbatch moved his
talk forward and Dick Eastman kindly stepped in by offering a second talk in
the afternoon to fill the vacant slot.
Bob Cumberbatch gave an excellent talk on his top ten free tools for a one-name study. I am already using some of the tools that he recommends but he mentioned some other tools which I have not yet had a chance to explore and which I now hope to find time to investigate. One of the tools he recommended is Evernote which has been recommended to me by a number of other people too. Evernote has a particularly useful OCR (optical character recognition) facility which is very handy for converting digital newspaper images into text files. A similar facility is offered by Google Docs (now part of Google Drive) though file sizes are limited to 2 megabytes. Google Fusion Tables can be used to generate heat maps, and is another tool I hope to explore. Outwit Hub is a scraper which can be used to extract records from a database in an orderly fashion. Jo Tillin has written an excellent blog post on how she uses Outwit hub in her one-name study, and Tony Timmins has also written about Outwit Hub on on his blog. Bob has kindly made his slides available online and they can be downloaded from this link.
After a break for lunch we returned for a talk on surname mapping from Tyrone Bowes. Tyrone runs a commercial mapping
company which trades under the names of Irish Origenes, Scottish Origenes and British Origenes. He
produces some nice-looking maps on his website and I was hoping that he might
provide us with some hints and tips on how to produce maps for our one-name
studies. Instead, he focused on his methodology for pinpointing the “genetic
homeland” of a surname. His talk was rather muddled and the methodology was not properly explained. There
were also many flaws in the assumptions he made. For example, his method
seems to work on the assumption that 37-marker matches all fall within the last
1000 years since the formation of surnames. The reality is somewhat different
and we are now finding that when SNP testing is done to determine the subclade
some 37-marker and 67-marker matches actually fall within different subclades because of a process known as convergence. As a result, their common
ancestor will date back several thousand years. When investigating matches with other surnames, especially in haplogroup R1b, it is essential to upgrade to 67 markers and to get some basic SNP testing done to determine the subclade. Tyrone is also drawing conclusions on surname origins based on matches within the Family Tree DNA database. However, the FTDNA database is very US-biased. It is estimated that around 70% of the people in the FTDNA database are in America. Close matches with other surnames will, therefore, more often than not be an indication of non-paternity events in America in the last few hundred years rather than in the British Isles. There was only a short time for
questions and there was not time to discuss all these problems, but if anyone
is interested in reading more on the limitations of the methodology it is worth looking at this lengthy discussion on the Anthrogenica Forum.
We were finally treated to a very interesting talk on cloud
computing for genealogists by Dick Eastman. Despite being asked to give the talk at very short notice, he still went to great trouble to anglicise his talk by using British English spellings and converting the dollars into pounds, though a few astute Guild members did manage to catch him out and tease him about a few Americanisms that got overlooked! Dick's slides can be downloaded from: http://www.eogn.com/handouts/cloud
There was just time for afternoon tea and a final chat with a few more friends before it was time to depart and make our way home. Lifts to the station were kindly arranged for those of us travelling by train.
Although it is now possible to attend the Guild conference
virtually by watching the livestream or by watching the recordings at a later
date to my mind the best part of the conference is the fact that we have the
chance to meet up with our fellow Guild members and get to network with them. I
was particularly pleased to have the chance to meet some of the Guild members that I “know”
on Twitter including Paul Carter, Amelia Bennett, and Maggie Gaffney. Next year we have vowed to have a tweet up so that all the
Guild members who are on Twitter can get together. There were also many people I would like to have had the chance to meet but the opportunity did not arise.
The next conference is scheduled to take place from 17th to 19th April 2015 at the Forest Pines Hotel in North Lincolnshire so put the date in your calendar now!
- Christine Hancock's report from the Conference
- Dick Eastman's account of his weekend at the Conference
Subscribe to:
Posts (Atom)