Sunday, 30 October 2016

A review of the birth registration process in England and Wales

The Law Commission, the organisation which advises the government on legal reform in England and Wales, is considering reviewing the law on birth certificates in its next round of proposed changes. Birth certificates are an important source for the family historian and if this consultation goes ahead I hope that family historians will have the chance to be involved in the process.

With advances in assisted reproductive technology there are now many complicated scenarios such as egg and donor conception and mitochondrial DNA donation which were not even dreamed about when civil registration began in 1837. The information recorded on birth certificates has changed little since that time so a review of the system is most welcome.

A reform of the birth registration process is particularly important for donor-conceived individuals. The law was changed in 2005 so that donor-conceived children have the right to access the information about their biological parents once they reach the age of 18. This information can be obtained from the Human Fertilisation and Embryology Authority. However, there is no requirement for the parents to disclose so an individual might never know the circumstances of their birth. There are also some parents who do not use official channels to find a donor, and for these children the details of the biological parents might never be recorded. As I argued in a recent paper, the advent of relative-matching DNA tests and the rapid growth of the genetic genealogy databases now mean that donor anonymity can no longer be guaranteed. It would, therefore, make more sense if there was a legal requirement to record the full information about an individual's biological and social parents at the time of conception for those who are using assisted reproductive technologies.

The proposed review is going to look at wider issues about the the role of birth registration in contemporary society. What is the purpose of a birth certificate? Why do we have birth certificates and for whose benefit are they kept? What do you think?

Further reading

Saturday, 29 October 2016

Genetic Genealogy Ireland 2016

I returned home on Tuesday after an enjoyable few days in Ireland attending the Back To Our Past Show. This event is held at the RDS in Dublin, and is the highlight of the genealogical year in Ireland. An over fifties show takes place at the same time, and this year there was also a coin collectors fair being held at the RDS. Around 20,000 people visit the RDS over the course of the three days to attend these shows, and visitors can move freely between the three different events. This year the halls seemed to be busier than ever and I imagine that attendance will be up on previous years.

Genetic Genealogy Ireland, sponsored by Family Tree DNA, is now in its fourth year and has become an integral part of Back To Our Past. My friend and fellow ISOGG member Maurice Gleeson is the inspiration behind Genetic Genealogy Ireland. He not only arranges the lecture schedule but also chairs all the sessions with boundless energy and good humour. Once again Maurice provided a fabulous programme for us with a good mix of speakers from both academia and the world of genetic genealogy. I was invited to do a talk on the future of autosomal DNA testing which was great fun to put together. Most of the talks will eventually be made available on the Genetic Genealogy Ireland YouTube channel, but I will just mention briefly some of my personal highlights. I will update this blog post with links to the recordings as and when they become available.

René Gapert and Jim Barry gave a fascinating presentation about the pioneering Earls of Barrymore DNA Project. This is the first privately sponsored project to extract ancient DNA from ancestral remains. Jim Barry, who runs the Barry DNA Project at Family Tree DNA, is the driving force behind this project and he joined us by Skype from his home in Reston, Virginia, to give his part of the presentation. There are currently no protocols for digging up ancestors and testing their remains so this project is exploring uncharted territory. Genealogical research is not of interest to population geneticists, and academics will only get involved in such projects if there is a historical incentive or if the research will demonstrate a new methodology. Jim was unable to get any of the commercial and academic labs (eg University College Dublin) to collaborate, and he had great difficulty in finding a lab that would do the testing. The testing was eventually done at Family Tree DNA but this was only possible because Fiona Monosmith, a technician at FTDNA, took a particular interest in the case. She has since left the company.

René Gapert is a forensic anthropologist. He explained how it was necessary to get the relevant licences from the appropriate authorities to go ahead with the work. In Ireland the coroner determines whether remains are forensic or historical. Coroners are not interested in historical remains. Ancient remains have to be reported to the National Museum in Ireland. There has to be a good reason for testing human remains. Genealogical curiosity is not a good enough reason for testing. The Barrymore case was judged to be of historical significance because of the importance of the Anglo-Irish Barry family in Irish history. It also helped that there was a well established Y-DNA project with many samples available for comparison purposes.

A limited number of Y-STRs were obtained from the samples but the results were somewhat inconclusive. DNA samples were taken from the thigh bones, but ancient DNA has advanced since the testing was done and it is has now been established that the petrous bone  the bones in the skull which protect the inner ear  are the best source of endogenous DNA. It's likely that further DNA testing will be done in the future, and it is also hoped to get a facial reconstruction done by researchers at Dundee University, who specialise in these techniques. The eventual aim is to publish the results in a good-quality peer-reviewed journal.

PDFs of René Gapert's slides are available on Research Gate and Academia.

Forensic anthropologist René Gapert discusses the background to the Barrymore Project
The Barrymore Project sparked some interesting discussions over the course of the weekend, particularly with regard to the methodology used. Two other speakers, Dan Bradley and Jens Carlsson, explained how DNA is degraded over time and is often reduced to short fragments of around 50 base pairs in length. STRs are repeating motifs of DNA letters and they often cover 150 or more base pairs. This raises questions about the validity of the STR results obtained. Testing on ancient remains is usually done in a specialist ancient DNA lab to avoid problems of contamination with modern DNA, so there are also questions as to whether or not Family Tree DNA were best equipped to do this type of testing. Ancient DNA testing is normally done these days by using next generation sequencing. The old PCR methods have a tendency to amplify contaminating modern DNA. Regardless of the limitations of the methodology used, Jim Barry is to be congratulated for attempting such a pioneering project, and there is much to be learnt from the process which will pave the way for similar projects in the future. A small group of genetic genealogists are hoping to collaborate to try and come up with some best practice guidelines for testing ancestral remains. We hope to seek input from people working in the ancient DNA field. If you think you can help do get in touch.

The Barrymore presentation is now available on the Genetic Genealogy Ireland YouTube channel and you can watch it by clicking on the image below.

Professor Dan Bradley from Trinity College Dublin spoke about “Recent findings in ancient Irish DNA”. This talk summarised some of the recently published ancient DNA research from Ireland, much of which has come from Dan Bradley's own lab, including the landmark paper on Neolithic and Bronze Age migration to Ireland which presented the first ancient genomes from Ireland. We now know that farming comes with people, and it seems that the farmers displaced much of the Neolithic population. Bradley suggested that this might be because they brought diseases with them such as the plague which knocked out the settled populations. Bradley's lab is working on an ancient genome survey of Ireland and this project is well under way. They are sequencing “tens of genomes” but he has no idea at the moment when the results will be published.

Dan Bradley discusses recent findings in Irish ancient DNA
Peter Sjölund, one of the administrators of the Sweden DNA Project at Family Tree DNA and one of the founders of the Swedish Society for Genetic Genealogy, gave an entertaining presentation on “Viking DNA in Ireland” with some wonderful graphics. Around 20,000 people have tested in Sweden, but the focus is mostly on Y-DNA and mtDNA. Sweden is a "paradise for genetic genealogy". Genealogical records are intact for the whole country dating back to the 1680s and there are court records going back to 1535. However, hereditary surnames have only been used in the last 150 years which is why mtDNA is just as important as Y-DNA in Sweden. Peter presented a case study where he had been able to triangulate two matrilines back for 11 generations to the 1650s, and the genealogical research was confirmed with mtDNA testing. Eighty per cent of Swedish men are R1a, I1 and R1b. Swedish R1b men get very few matches. The admins of the Sweden DNA Project collaborate very closely with their fellow project admins in Norway, Finland and Russia. Peter estimates that around 8000 people have tested in Norway, around 9000 people in Finland but just a few hundred in Denmark for reasons which are unknown. A member of the audience joked that a lot of sperm donors come from Denmark!

Peter Sjölund discusses Viking DNA in Ireland
On Saturday morning I attended the first talk in the main genealogy programme which was presented by Mike Mulligan from AncestryDNA and Sheila O’Donnell, who is one of the Ancestry Progenealogists. Sheila gave a very useful overview of the key record sets for Irish research with particular reference to those that are available from Ancestry. In addition to the records on Ancestry many Irish records are freely available online such as the Irish censuses for 1901 and 1911, and the historic birth, marriage and death records on The Irish Catholic Parish Records are an important source for Irish research. In the 1861 census 78% of the population were Catholic, and this figure had increased to 89% in 1891. The Catholic records are available on both Ancestry and Findmypast. I learned a few interesting facts. There were 67,000 Irish people living in in England in the 1901 census. In the 1855 census of New York one quarter of the people living in Manhattan were born in Ireland.

There was not much time for Mike Mulligan to talk about AncestryDNA, but I picked up a few interesting nuggets. Apparently people from the west of Ireland and those living in rural areas get more matches than those living elsewhere in Ireland, probably because there was lots of emigration to America from the west coast, and Americans are the dominant population in the databases. People from the west coast have an average of 130 fourth cousin matches but people on the east coast have about 70 or so matches. This compares with people from England who have between about 20 and 30 fourth cousin matches. This tallies with my own experience at AncestryDNA, as I currently have 29 fourth cousin matches. (In reality fourth cousin matches can be anywhere between a fourth and a sixth cousin.)

Robert Casey gave an interesting presentation on “Y-SNPs: key to the future”, which will appeal to advanced genetic genealogists. He discussed the problem of convergence whereby two haplotypes change over time and drift close together creating coincidental matches. Robert then went on to discuss a methodology for clustering matches into sub-groups based on SNPs, STRs and surnames. He is currently working with a computer programmer and is hoping to come up with a tool to produce automated charts.

In the discussion afterwards we talked about identifying modal haplotypes for specific subclades. It's particularly helpful for surname project administrators to know the modal haplotype so that they can identify off modal markers that are likely to be informative for their surname clusters. Diana Gale Matthiesen compiled a list of modal haplotypes for various haplogroups but her list has not been updated for some time, though it is still a potentially useful source. It would be good if we could have an updated list of modal haplotypes for all the different haplogroups in the ISOGG Wiki. If anyone is up to the challenge do get in touch.

Robert Casey discusses advances in Y-SNP testing and analysis
Jens Carlsson from University College Dublin gave a fascinating presentation about the “Genetic identification of the 1916 Cork Rebel Thomas Kent”. Thomas Kent was one of the 16 men executed by British forces in the aftermath of the Easter Rising. He was buried in the grounds of what is now Cork Prison but there had always been uncertainty about his identity, which was based purely on circumstantial evidence. This work came about as a result of an archaeology dig at Cork prison during which the putative remains of Thomas Kent were exhumed. The Garda and Forensic Science Ireland contacted University College Dublin and asked for their help with the identification. Thomas Kent has two living nieces and their DNA was used for comparison. Because DNA degrades over time it's generally only possible to retrieve small fragments of ancient DNA ranging in length from 30 to 70 base pairs. The short length of the DNA fragments also means that traditional IBD methods of determining kinship cannot be used. For this project Carlsson’s team developed a brand-new methodology for estimating relatedness using small amounts of genetic data. The methodology was checked by using computer simulations on data from the 1000 Genomes Project. The researchers concluded that there was less than a one in a million chance that they were wrong. The nieces and Thomas Kent were five trillion times more likely to be related than not related. A paper has been submitted for publication and is available in the BioRxiv preprint server. I hope a recording of this presentation will be made available. If so, I highly recommend that you watch it.

Jens Carlsson explains a new methodology that was used to identify the remains of the Easter Rising rebel Thomas Kent
The highlight of the conference for me was a very moving presentation from Diahan Southard on “The marriage of genetics and genealogy: a case study”. Diahan’s mother was adopted from an unmarried mother’s home in Seattle, Washington. Through a combination of genetic matches and genealogical research Diahan's mum was eventually reunited with some of her biological family. The talk raised some interesting ethical issues showing how advances in technology mean that DNA testing can sometimes have unanticipated consequences. Diahan ended the presentation with a short video which reduced some of the audience to tears. This talk was not recorded, but I understand Diahan will be presenting at Rootstech so, if you get a chance, do go and hear her story. (Update: this talk was recorded after all and the recording will be available online for  about six months.) Diahan also gave an excellent talk on the basics of autosomal DNA testing. She has a good eye for design and produces some wonderful whizzy slides which were the envy of all the other speakers.

Diahan Southard shared the story of her mother's reunion with her biological family
Maurice Gleeson gave a thought-provoking talk on the use of SNPs and STRs in the Gleeson Project and his attempts to link the results into the Irish annals. Genetic genealogists use the term NPE to describe so called non-paternity events where the surname does not correspond with the transmission of the Y-chromosome. We’ve always thought this term is less than satisfactory, but no one has ever been able to come up with a suitable alternative. The term misattributed paternity is used in cases where the results don’t match as expected. However, most NPEs are not surprises but are well documented illegitimacies, and the family historian knows in advance that his results will not match other people with his surname. Maurice proposed two new alternatives – “breaks in transmission” and “surname switches” – both of which I quite like. Perhaps these new names might one day catch on.

Maurice Gleeson proposed some alternative terminology for NPEs
The final talk of Genetic Genealogy Ireland 2016 was given by Ed Gilbert from the Royal College of Surgeons in Ireland. Ed gave us an update on the Irish DNA Atlas Project and the Irish Travellers Project. Work is ongoing on the Irish DNA Atlas Project so the talk was not recorded, and we weren't allowed to take photographs. I can't say too much about the results that were presented other than that this is a very exciting project. To qualify for the project participants must have eight great-grandparents born with 30 to 50 kilometres of each other. The genealogies are verified by researchers from the Genealogical Society of Ireland. Only one in eight of all applicants are accepted. If the criteria were relaxed they could have over 2000 participants. The team have retained all the contact details so that they can contact all these people in the future if required. The project currently has 230 individuals, and they have genetic data on 194 participants. It has been possible to identify distinct regional clusters within Ireland. The Irish data has been compared with data from the People of the British Isles Project and a preliminary comparison has been done with European data. It is hoped that a paper will be submitted by the end of this year or the beginning of next year when all the analyses have been completed. We are all looking forward to seeing this paper in print.

Unfortunately, because of the terms and conditions agreed when the Irish DNA Atlas Project was set up, the data will not be made available to other researchers. Also, the people who have participated in the project will not have access to their own raw data. However, if you do have four grandparents all born within the same region of Ireland you can participate in another project run by the British company Living DNA. For details see this article in the Irish Post. If you have tested at Family Tree DNA and have four grandparents born within the same region of Ireland you can join Maurice Gleeson's Irish Grandparents Project.

Ed Gilbert also briefly summarised the results of the Irish Travellers Project. Irish travellers represent 6.6% of the Irish population. Fifty individuals participated in the project. They each had a minimum of three grandparents with a traveller surname. Sixty per cent of the participants were female. The analysis was based on autosomal DNA, and they currently do not have any Y-DNA or mtDNA data. Ed Gilbert presented a poster about the Irish Travellers Project at the recent meeting of the American Society of Human Genetics. A paper has been submitted for publication and they are currently dealing with reviewer comments so hopefully the paper will be published in the next few months.

Ed Gilbert presents some preliminary results from the Irish DNA Atlas Project
As there were so many exciting talks this year I didn't get much of a chance to look at the various stands at the show but I did manage to escape briefly and take a few photographs. DNA testing was very much at the forefront this year. Family Tree DNA had the market in Ireland to themselves until 2015 when AncestryDNA launched their autosomal test in Ireland and Britain. This year Family Tree DNA and AncestryDNA were joined by a third company, Living DNA, who are selling an interesting new genetic ancestry test that offers regional breakdowns. The presence of three DNA companies at this event is a sign that DNA testing is now starting to go mainstream. The DNA stands all seemed to be very crowded and I'm sure a lot of kits were sold.

Crowds gather on the Family Tree DNA stand
The AncestryDNA stand
The Living DNA stand
The day after the conference had finished we met up for the ISOGG Day Out, organised by Gerard Corcoran, ISOGG's regional rep for Ireland. Gerard always pulls out all the stops for us but I will write about this in a future blog post.

Other articles about Genetic Genealogy Ireland 2016
Update 17th December 2016
The recordings of all the available lectures are now available free of charge online on the Genetic Genealogy Ireland YouTube channel.

© 2016 Debbie Kennett

Wednesday, 26 October 2016

Family Tree DNA and Assassin's Creed The Movie

Family Tree DNA has teamed up with 20th Century Fox to offer a special DNA testing package which will be promoted with the forthcoming action adventure film Assassin's Creed.

For the duration of the promotion it will be be be possible to purchase a special Assassin’s Creed DNA Testing Bundle for $89 which includes a Family Finder test, a Warrior Gene test and a one-month premium subscription to Findmypast.

There is a also a competition (what they have called a "sweepstakes") to win a trip for two to Las Vegas for an "Assassin’s Creed-themed adventure". The competition appears to be open worldwide but note that the prize only includes domestic flights in the US so if you were one of the lucky winners you would have to pay your own air fare to the US.

The film is released worldwide on 21st December but the tests are available with immediate effect and the competition has already started. Here's the promotion for the Assassin's Creed package.

The Warrior Gene is interesting because it's transmitted on the X-chromosome. At one time Family Tree DNA offered a standalone Warrior Gene test. Jobling et al comment on the Warrior Gene in their article In the blood; the myth and reality of genetic markers of identity (Ethnic and Racial Studies 2016 39(2): 142-161):
The enzyme monoamine oxidase A (MAOA) degrades a subset of neurotransmitters including serotonin, epinephrine, and norepinephrine – molecules that transmit information from one neuron to another. Adjacent to the MAOA gene is a region of DNA that controls how much enzyme is produced, and a common variant of the length of this region (called 3R) leads to reduced production of enzyme compared to other common versions (Sabol, Hu, and Hamer 1998). The gene lies on the X chromosome, so males, who have only one X, show the simplest relationship between the version of the gene they carry and its behavioural consequences. Men carrying the 3R version (the ‘warrior gene’) are more likely to respond aggressively to maltreatment or stress (Caspi et al. 2002). Despite charging almost 100 dollars for the ‘warrior gene’ test, the testing company calls the association between gene variant and behaviour a ‘factoid’, and best used as a ‘cocktail conversation starter’. Nonetheless we might wonder if the results of the test have any influence on the behaviour of people who are tested; the possible influence of the 3R variant was used in 2009 as part of a successful criminal defence in the USA (Brooks-Crozier 2011), and made the difference between thirty-two years’ imprisonment and the death penalty.
See also this excellent article by Adam Rutherford for the New Statesman on Why we can't blame "warrior genes" for violent crimes. (Thanks to Ann Turner for alerting me to this article.)

23andMe and AncestryDNA are already advertising on TV and, as DNA testing goes mainstream, it's important that Family Tree DNA promote their products on mass media to keep up with the competition. So whatever you might think about the Warrior Gene test it's good news that Family Tree DNA are now advertising in cinemas and actively promoting the Family Finder test. This will help to familiarise people with the company name, and perhaps introduce a new demographic to DNA testing who might not otherwise have considered buying a test.

To learn more about the Assassin's Creed package and the competition visit:

You need to scroll right down to the bottom of the page to find the information about the competition.

Here is the official press release from Family Tree DNA and 20th Century Fox.
Family Tree DNA and 20th Century Fox Team Up for Historical Adventure 
Genetic genealogy pioneers announce exciting partnership with the theatrical release of Assassin’s Creed. 
Houston, Texas — October 25, 2016:

In association with the upcoming theatrical release of the epic adventure film ASSASSIN’S CREED, in theaters December 21, Family Tree DNA is pleased to announce a new partnership with 20th Century Fox and Findmypast, which features the Assassin’s Creed DNA Testing Bundle and Assassin’s Creed Sweepstakes. 
Loosely based on the popular video game franchise of the same name, and starring award-winning actors Michael Fassbender and Marion Cotillard, the movie’s main character Callum Lynch—through a revolutionary technology called the Animus—travels deep into the past to discover that his genetic ancestor, Aguilar, was part of a mysterious secret organization, the Assassin’s, in 15th Century Spain. The action-adventure follows Callum as he relives Aguilar's memories in present day.

As pioneers in the direct-to-consumer DNA testing industry, Family Tree DNA was tapped by 20th Century Fox to be the exclusive testing partner for the film. The company’s premier suite of DNA tests along with the world’s most comprehensive matching database enable users to trace their lineage through time, explore ancestry and connect with relatives across the globe.

Family Tree DNA Director of Product Development, Michael Davila, noted that “The opportunity to partner with 20th Century Fox on the release of Assassin’s Creed is not only exciting but serendipitous. The storyline of Callum Lynch connecting to his ancestral past ties in completely with what our company does in helping people discover their origins and explore family history,” said Davila. 
“We are excited to be partnering with Family Tree DNA,” said Zachary Eller, Senior Vice President, Marketing Partnerships, 20th Century Fox. “They provide a fantastic opportunity to bring the central themes of Assassin’s Creed to a real world application by allowing consumers to actually discover their past.” 
With the purchase of the special limited-time Assassin’s Creed Bundle, customers will be mailed a sample collection kit which, when processed, will provide both Family Tree DNA’s signature Family Finder test and the Warrior Gene DNA test. They will also receive a free one-month premium subscription to Findmypast’s online genealogy service. 
According to Belinda Hanton, Global Head of Partnerships at Findmypast, “We are thrilled to be teaming up with Fox and Family Tree DNA to promote family history research and genetic genealogy. It’s partnerships like this that allow us to speak to completely new audiences and help spread the word that anyone can start exploring their heritage at the click of a mouse. The lives of our ancestors are not only recorded in historical records, but are also written in our DNA and it is now easier than ever before to unlock the incredible stories hidden in our families’ past.” 
Using a simple cheek swab and step-by-step instructions, users return the sample collection test kit by mail, in a provided envelope, directly to Family Tree DNA. Results typically take four to five weeks and are delivered through a private customer dashboard with email notification. Unlike other testing companies, Family Tree DNA results are kept completely confidential and secure privacy settings put users in control of how much information they choose to share.

Family Finder is an autosomal (non-sex) DNA test that finds matches within five generations and includes myOrigins,a powerful mapping tool that provides a detailed geographic and ethnic breakdown of personal genetic ancestry. The Warrior Gene test determines whether a person carries the Monoamine Oxidase A (MAOA) gene variant, dubbed the “Warrior Gene,” which some researchers say may cause certain carriers to engage in more risk-taking behaviors and be able to better assess their chances of success in critical situations. 
Together with the Assassin’s Creed DNA Testing Bundle is the Assassin’s Creed Sweepstakes and a chance to win a Grand Prize trip for two to Las Vegas for an Assassin’s Creed-themed adventure. The experience includes a series of high-octane Assassin’s Creed-inspired activities like a master parkour class, nighttime zip lining and an electrifying sky jump from the tallest tower in the city.

Although no purchase is necessary to enter the contest, purchasing the Assassin’s Creed Bundle earns customers ten additional entries into the Sweepstakes for a greater chance to win a trip to Las Vegas as well as other prizes. Followers will also have the opportunity to earn bonus entries by sharing Sweepstakes social posts on their Facebook and Twitter pages. 
With the exclusive DNA Testing Bundle and Sweepstakes movie tie-in, Assassin’s Creed fans everywhere will be able to jump back in time, embrace their inner warriors and unlock their genetic memories.

“The partnership between Fox’s Assassin’s Creed and Family Tree DNA is a perfect fit,” Davila said. “Test-takers get to find out if they carry the “Warrior Gene” in their DNA, and while they’re at it, will be able to delve into the exciting world of genetic genealogy and discover their own family histories…all through DNA. Everyone has a story to tell…so it’s an absolute win-win scenario.

Wednesday, 19 October 2016

My pick of the abstracts and posters from ASHG 2016

The American Society for Human Genetics is holding its annual conference from 18th to 22nd October in Vancouver, Canada. The Platform and Poster Abstracts are now available online. The research presented at this meeting gives a taste of some of the publications and developments to come in the next year or so. There are a number of abstracts that are of particular interest to genetic genealogists. In particular I note that AncestryDNA are presenting a number of interesting posters which hint at some new tools that might be on the way. I've highlighted below my picks from the conference programme.

23andMe will also be at the ASHG meeting. They have published a list of the abstracts for their presentations and posters on their blog, though none of the content is of direct interest to genetic genealogists. 

Platform Abstracts

Ultra-fine structural inference and population assignment using IBD network clustering and classifiers accurately assign sub-continental origins represented in a large admixed U.S. cohort.
E. Han, R. Curtis, P. Carbonetto, K. Noto, J. Byrnes, Y. Wang, J. Granka, A. Kermany, K. Rand, E. Elyashiv, H. Guturu, N. Myres, E. Hong, C. Ball, K. Chahine. DNA, LLC,
San Francisco, CA.

Motivation & Objectives: Identifying the geographic origin of individuals using genetic data has broad application in forensics, human disease and evolution. There have been multiple methods proposed to achieve this goal, such as Principle Component Analysis (PCA), Spatial Ancestry Analysis (SPA) and Geographic Population Structure (GPS). However, most methods suffer from decreased prediction accuracy outside Europe and do not apply to the US population comprised of admixed immigrants. In this study, we describe a new method and demonstrate its accuracy in predicting geographic origins in the US post-European colonization or internationally for single origin and admixed samples.
Methods: We use a database of over 1.5 million consented genotype samples collected from the US and internationally, along with samples from public databases such as POBI. We build a genetic network by estimating the amount of identity-by-descent (IBD) sharing between all individuals. By iteratively applying the Louvain method for community detection, we find a hierarchy of genetic clusters in the network. Levering user-generated pedigrees going back 6-8 generations, we annotate each cluster with birth locations that are enriched in historical time periods. The birth locations of these clusters are generally specific to locations in the US or internationally, allowing for concise geographical interpretation. Although community detection results assign samples to only one cluster, we use machine learning classification to assign samples to multiple clusters. Given this classification and enriched birth locations, we identify the likely geographic origins of each sample.
Results: Our results include over 300 stable clusters, each comprised of more than 1000 samples. Some clusters correspond to narrow geographical regions, such as people descended from southern West Virginia in the 19th century, and others to broader groups, such as European Jews from Poland. By using the associated pedigrees, we demonstrate the accuracy of these predictions: over 95% of the assigned individuals have at least one known ancestor born in the enriched region defined by most clusters.
Conclusion: By utilizing large-scale genetic data with associated pedigrees, we have developed the first method for predicting the geographic origin of individuals within the US or internationally with high accuracy. This approach can be used for ultra fine scale genetic ancestry mapping in any population.

A massively scalable phenotyping approach using social media for genetic studies.
J. Yuan1,2, A. Gordon1, D. Speyer1,2, D. Zielinski1, R. Aufrichtig1, J. Pickrell1,3, Y. Erlich1,2. 1) New York Genome Center, New York, NY; 2) Computer Science, Columbia University, New York, NY; 3) Biological Sciences, Columbia University, New York, NY.

While DNA sequencing is largely a tractable problem, massive phenotyping is still a challenge, especially for Internet-based studies. Traditional methods, such as physical exams, scale poorly for large numbers of individuals. Questionnaires are easier to collect, but administering lengthy or frequent questionnaires creates a negative experience for participants, leading to lower completion rates. Electronic health records are a great resource for phenotypes, but they exhibit large heterogeneity when collected from various resources and are subject to an array of confidentiality restrictions that complicate their collection. Recent studies have highlighted the value of obtaining digital phenotypes by interpreting the interactions of users with digital outlets as a reflection of underlying traits. In particular, these studies have shown that social media data enables the collection of various phenotypes including big five personality traits, sexual orientation, sleeping patterns, and even heart rate from regular user videos. The ubiquity of the data and its ease of collection through standard APIs enable a new methodology for large scale phenotypic collection. Here, we report our ongoing efforts to enable participants to donate their social-media data along with their genomes in order to understand the genetics of digital phenotypes. In our previous work, we developed DNA.Land (, an online platform where users may register and securely contribute their Direct to Consumer genomic data, as well as receive reports of ancestry and shared relatives with other DNA.Land users. Since our launch in ASHG2015, we have obtained over 20,000 users, many of whom have been eager to share personal information such as family history. We are now building a new component in DNA.Land in which users can contribute their Facebook data for scientific studies. We will present our IBM Watson-based system to predict traits from social media data and will describe the type of information DNA.Land users will receive. In addition, we will discuss the particular challenges in collecting this data with respect to both computational efforts and privacy concerns. Our approach is applicable for other types of large scale efforts such as the Precision Medicine Initiative and can easily scale to millions of people.

Poster Abstracts

Insights into the geographical distribution of genetic admixture of unrelated volunteer donors and recipients of stem-cell transplants.
A. Madbouly 1, K. Besse 1, Y. Wang 2, J. Byrnes 2, C. Ball 2, N. Myres 2, M. Maiers 1. 1) Bioinformatic Research, National Marrow Donor Program, Minneapolis, MN; 2), San Francisco, CA, USA.

Genetic ancestry of self-described groups may vary across geographic locations in the US, a phenomenon documented anecdotally but not thoroughly explored in the literature. We studied the genetic ancestry of 995 HLA matched donor/recipient (DR) pairs from the Be The Match® registry with a focus on regional ancestry differences among ethnic groups. We hypothesized that, along with historical events, donor/transplant center distribution and socioeconomic factors might influence the geographical spread of some genetic admixtures. We genotyped 995 DR pairs on the Illumina OmniExpress chip with approximately 730,000 SNPs. Self-reported race and ethnicity was collected for donors at the time of registry recruitment. Recipients’ race and ethnicity was recorded at the transplant hospital once at the time of diagnosis and again after transplant. The majority of the study cohort (94%) self-identified as European Caucasian (CAU). The rest identified as Hispanic (HIS) (3.5%), African-American (1%) and Asian or Pacific Islander (1.5%). Address zip code information was available for 99% of recipients but only 59% of donors. Genetic ancestry was estimated by applying the AncestryDNA ethnicity estimator pipeline, which provides a vector of 26 admixtures. Some admixtures were combined for the analysis due to small counts and minimal impact such as detailed African (AFR) admixtures. We then mapped the geographical distribution of European (EUR) and non-EUR genetic admixtures for self-reported CAU and non-CAU individuals, optimizing geographical regions for subject privacy. The main self-reported race groups showed average proportions of AFR and EUR admixtures compatible with Bryc and colleagues (2015). However, our results revealed larger Amerindian admixture in self-reported HIS, especially among recipients. When stratifying regionally, systematic differences emerged in admixture distribution among similar race groups mostly interpretable by historic events. Separating donors and recipients suggested possible additional influences, such as donor and transplant center geographical spread. Importantly, we observed differences in the distribution of non-majority admixtures such as increased AFR admixture in self-reported CAU donors (but not recipients) in some southern states suggesting a possible socioeconomic link. This work has the potential of guiding stem-cell donor registry strategies on volunteer donor recruitment and donor and transplant center planning.

Geographic and historic changes in runs of homozygosity among more than 1,000,000 individuals sheds light into the recent demographic history of US population.
A. Kermany, C. Ball, J. Byrnes, P. Carbonetto, K. Chahine, R. Curtis, E. Elyashiv, J. Granka, H. Guturu, E. Han, E. Hong, N. Myres, K. Noto, K. Rand, Y. Wang. DNA, LLC, San Francisco, CA.

Runs of Homozygosity (ROH) are indicators of segments of chromosomes identical by descent between parental haplotypes. Distribution of such runs along the chromosome contains information regarding the demographic history of the population under study, in particular it reveals trends in consanguinity. In this study, we analyze the distribution of runs of homozygosity – chromosomal locations, number of runs and lengths of runs - as well as estimated inbreeding coefficient (F) among more than 1,000,000 consented AncestryDNA customers. We report on observed variations in distribution of ROH based on geographic origins - inferred from the available pedigree data – admixture proportions as well as birth year cohort. In particular, we present our results on variations in the distribution of ROH within 19 communities within the US population - identified based on analysing a network of genetic matches in the database - and investigate differences in patterns of ROH between each group and comment on the inferred demographic history within each group.

Y-chromosomal sequencing and screening reveal both stability and migrations in North Eurasian populations.
O. Balanovsky 1,2, V. Zaporozhchenko 2,1, A. Agdzhoyan 1,2, I. Alborova 5, M. Kuznetsova 2, V. Urasin 3, M. Zhabagin 4, M. Chukhryaeva 2,1, Kh. Mustafi n 5, C. Tyler-Smith 6, E. Balanovska 2 . 1) Vavilov Institute of General Genetics, Moscow, Russian Federation; 2) Research Centre for Medical Genetics, Moscow, Russia; 3) YFull service, Moscow, Russia; 4) National Laboratory Astana, Nazarbayev University, Astana, Republic of Kazakhstan; 5) Moscow Institute of Physics and Technology (State University), Moscow, Russia; 6) The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.

Y-chromosomal markers exhibit the highest interpopulation diversity in the genome and thus form one of the most informative tools for tracing population history. However, their information value depends on discovering SNPs which subdivide haplogroups with broad geographic distribution into branches revealing fine population structure. Progress in such discoveries has recently moved from a slow linear phase to a rapid exponential phase due to NGS. We applied this approach to the Y-chromosomal pool of North Eurasian populations and concentrated on haplogroups C, G1, G2, N1b, N1c, and R1b. We sequenced 181 Y-chromosomes (capturing 11 Mb from each sample), developed the NGSConv software for calling Y-chromosomal SNPs, and identified roughly 2,500 SNPs, most of which were new. Then we constructed phylogenetic trees and dated dozens of their branches using our estimates of the mutation rate. The last – but not the least – step included screening branch-defining SNPs in the entire Biobank of indigenous North Eurasian populations (led by prof. Elena Balanovska), which includes 26,000 samples from 260 populations. This screening resulted in frequency distribution maps of 29 branches of haplogroups R1b and C, thus increasing the phylogenetic resolution by an order of magnitude compared to the two initial haplogroups. For haplogroup R1b, we identified a previously unstudied “eastern” branch, R1b-GG400, found in East Europeans and West Asians and forming a brother clade to the “western” branch R1b-L51 found in West Europeans. The ancient samples from the Yamnaya archaeological culture are located on this eastern branch, showing that the paternal descendants of the Yamnaya population – in contrast to the published autosomal findings - still live in the Pontic steppe and were not an important source of paternal lineages in present-day West Europeans. For haplogroup C-M217 - the predominant paternal component in Central Asians - we found signals of simultaneous expansion in two independent branches. Both expansion times and gene geographic maps of the expanded lineages indicated the emergence of the Mongol Empire as the likely trigger. We conclude that simply discovering new SNP is not enough, but in combination with screening for the branch-defining SNPs in large biobanks of indigenous populations, it allows comprehensive reconstruction of male population history. The study was supported by the Russian Science Foundationgrant 14-14-00827 to OB.

Admixture inference of African Americans and Latinos in the United States through time.
M.L. Spear 1, D.G. Torgerson 2, R.D. Hernandez 1,3,4. 1) Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA; 2) Department of Medicine, University of California, San Francisco, San Francisco, CA; 3) California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, CA; 4) Institute for Human Genetics, University of California, San Francisco, CA.

The study of admixed populations has provided important insights into medical genetics and population history. The genomes of admixed individuals are mosaics of segments originating from different ancestral populations. At the genome-wide level, the proportion of one’s genome deriving from each ancestral population is referred to as “global ancestry proportions”. However, modern statistical methods enable inference of the ancestry at individual SNPs within a genome, “local ancestry”, which allow us to reconstruct the mosaic pattern of ancestry tracts across an individual’s genome. Local ancestry inference is critical for the analysis of admixed genomes and has been widely studied in the fields of medical genetics and human demographic history. Local ancestry tracts can be used to infer migration histories but the question remains how these histories have shaped ancestry proportions over time, particularly in the United States, a “melting pot” country that has faced changing societal norms over the past century. It has yet to be determined how the length distribution of ancestry tracts in admixed individuals has changed over decades as well as how the variation in ancestry proportions across chromosomes and individuals may differ. Thus, we estimated local ancestry for 4,600 Latinos and 2,100 African Americans from the Genetic Epidemiology Research on Adult Health and Aging (GERA) dataset using RFMix. With these local ancestry tracts, we used TRACTS to compare the observed length of the ancestry tracts to predictions of different demographic models of migration scenarios. Individuals were grouped by 5-year birth year categories, and comparisons were made between the demographic models generated from each birth year category. Overall, the local ancestry tracts of African Americans and Latinos from the United States have provided insights into the change in complexity of their genetic structure throughout the 20th century.

Fine-scale population structure in France: Loire River as genetic barrier.
C. Dina 1,2, J. Giemza 1, M. Karakachoff 1,2, F. Simonet 1,2, K. Rouault 3, E. Charpentier 1,2, S. Lecointe 1,2, P. Lindenbaum 1, J. Violleau 1,2, H. Le Marec 1,2, C. Férec 3, S. Chatel 1,2, S. Hercberg 4, P. Galan 4, J-J. Schott 1,2, E. Génin 3, R. Redon 1,2. 1) Thorax Inst, INSERM-CNRS, Nantes, France; 2) CHU Nantes, Nantes University; 3) Inserm UMR 1078, CHRU Brest, University Bretagne Occidentale, EFS, Brest France; 4) Université Paris 13, Equipe de Recherche en Epidémiologie Nutritionnelle, Centre de Recherche en Epidémiologie et Statistiques, Inserm (U1153), Inra (U1125), Cnam, COMUE Sorbonne Paris Cité, F-93017, Bobigny, France.

Background The genetic structure of human populations varies throughout the world, being infl uenced by migration, admixture, natural selection and genetic drift. Human population structure has first been investigated at broad scales, between and within continents. Currently researchers focus on finer scales, examining genetic structure within countries. Characterising such genetic variation is of interest as it provides insight into demographical history and informs research on disease association studies, especially on rare variants. We here explored the genetic structure of a population living on the French territory (hereafter called French population) both on the whole territory and then on Western part where interesting stratification was identified.
Methods and Results We genotyped genome-wide ; 2276 individuals with known department of origin from French Population (SU.VI.MAX study) using Illumina Chip; 456 individuals (PREGO study) from Western France Atlantic Coast, from Finistère to Vendée, with at least three of their grandparents born within a 15 kilometres distance using Axiom CEU Chip. With EEMS software we visualised areas with low effective migration rates - the migration barriers, which match with geographical features, with particularly strong barrier on the lower course of Loire in Western France. We then focused on the PREGO study and Principal Components analysis revealed that individuals from the same departments form clusters. In both datasets we observed a high correlation between geographical position and components (p-value < 2e-16). Many independent methods support the hypothesis that Loire River is a genetic barrier. The two groups of individuals, from north or south of Loire, are well differentiated along PC1 axis. ADMIXTURE estimated different ancestry proportions for the two groups. The first split of hierarchical clustering returned by fi neSTRUCTURE, and the one based on normalized counts of identity-by-descent segments is between north and south of Loire.
Conclusion We here report genetic stratification at the level of continental French territory. The migration pattern is following the geographical structure. A specific pattern is noticed around the Loire River. We confirm both evidence for isolation by distance and existence of a genetic barrier, the Loire River. The discovered fi ne-scale population structure may have consequences in association analyses, especially for rare variants which tend to be geographically clustered.

Identification and characterization of common haplotypes found in a database of one million human genomes.
H. Guturu 1 , K. Noto 1 , J. Byrnes 1 , S. Song 1 , P. Carbonetto 1 , R.E. Curtis 2 , E. Elyashiv 1 , J.M. Granka 1 , E. Han 1 , E.L. Hong 1 , A.R. Kermany 1 , N.M. Myres 2 , K.A. Rand 1 , Y. Wang 1 , C.A. Ball 1 , K.G. Chahine 2 . 1) DNA, LLC, San Francisco, CA; 2) DNA, LLC, Lehi, UT.

Introduction: A common DNA-based method to detect relatives and ancestors (“cousins”) is to identify and match shared portions of chromosomes (haplotype blocks) between an individual and their potential relatives. Identifying and matching the shared haplotype blocks is challenging due to the non-uniform halving of genetic information that takes place during the meiosis events of each generation. As the number of generations increases, the average size of matching haplotype blocks shrink, due to successive chromosomal recombination. Additionally, genetic drift, flow and selection establish population structure that skews the distribution of frequency and size of some haplotype blocks. We aim to characterize haplotype blocks based on their frequency profiles and link haplotypes to ancestral communities (“genetic ethnicities”) and more recent admixed communities.
Methods: Using a novel haplotype block matching algorithm, we identify haplotype blocks that occur frequently in a database of over one million samples genotyped by DNA, LLC. We review the frequency profiles of each haplotype, and associate them with metadata inferred from global and local estimated admixture ("genetic ethnicity") as well as aggregated family history data from public family trees associated with some of the genotypes.
Results: Common SNP windows have been characterized as identifying signatures of the gamut from ethnicities to more recent admixed communities resulting from migration. Further, we show that these signals of ethnic populations and communities can be used to improve the accuracy of identifying distant “cousin” matches by correcting for matches that are predominately generated due to more ancient signals of ancestry.
Conclusion: By linking common haplotype blocks to ancestral groups of varying age of origin, we can improve the accuracy of ancestor identification for the desired task – ancient haplotype blocks for ethnicity admixture detection to more recent haplotype blocks that reflect recent cousins. Additionally, our characterization of haplotype blocks by ancestral groups reveals interesting candidates for further study and interpretation of their functional implications in various ethnic and community groups.

Maps of effective migration as a summary of human genetic diversity. 
B. Peter, D. Petkova, M. Stephens, J. Novembre. University of Chicago, Chicago, IL.

A dominant pattern of genetic diversity in humans is that geographically proximal populations are generally more genetically similar to one another; however, there are exceptions to this rule. Persistent geographical features such as mountains, oceans, or deserts, have allowed excess genetic differences to accumulate in some regions more than others. Conversely, historical migrations and population movements have led to cases where exceptional levels of similarity persist across large geographic distances. To provide more insight into how genetic differentiation is distributed geographically in humans, we examine the fine-scale genetic structure of humans. We produce maps that represent the spatial structure of human genetic diversity using a recently developed, spatially explicit method (EEMS, Estimation of Effective Migration Surfaces). We apply EEMS on global, continental, and sub-continental scales, analyzing genetic data from 8,740 individuals from 469 geographically localized populations, obtained from 24 different source studies. In addition to the major, well-known barriers such as the Sahara, Himalayas and Mediterranean, we detect barriers that correlate with historic language group boundaries (boundaries of Slavic and Bantu speakers with their neighbors), mountain ranges (Zagros, Caucasus, Ural) and marine features (English Channel, Adriatic Sea, Wallace line). We also identify regions showing high connectivity despite having geographic separation (Britain and Scandanavia, Iceland and Denmark, among the Lesser Sunda Islands). Simultaneously, we find that levels of diversity vary more smoothly, decreasing gradually with distance from Africa. Overall, our results suggest that diversity patterns are consistent and primarily shaped by the signature of the Out-of-Africa expansion, but that migration rates are strongly influenced by geography and local events.

The African Genome Resource Project: Patrilineal and matrilineal inheritance through the Y chromosome and the mitochondrial genome.
F. Abascal, D. Gurdasani, T. Carstensen, M. Pollard, C. Pomilla, M. Sandhu on behalf of AGR investigators. Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom.

Background The Y chromosome and the mitochondrial genome are inherited from the paternal and maternal lines, respectively. The lack of recombination in the mitochondrial genome and in large part of the Y chromosome leads to evolution almost in isolation from the autosomal genome. As a result, the Y chromosome and the mitochondrial genome offer a unique perspective on human demographic processes. Y chromosome (Y-) and mitochondrial (mt-) haplogroups can be very informative about human origins, migrations and admixture, as well as about potential sex biases during these processes. Further characterisation of the diversity of Y- and mt-haplogroups within Africa is essential to understand human history. Here, we present the mitochondrial and Y chromosome diversity among ~5000 individuals from the African Genome Resource panel.
Methods We predicted the mt- and Y-haplogroups for 4,990 individuals and 2,399 males, respectively, representing diverse ethno-linguistic groups from Ethiopia, Uganda, South Africa, Egypt, and 5 African populations sequenced within the 1000 Genomes project. Mitochondrial and Y haplogroups were predicted with Haplogrep and YFitter, respectively. We called the mitochondrial genome and the Y chromosome for each sample and reconstructed their phylogenetic relationships with FastML.
Results We found evidence for Eurasian admixture among several populations across sub-Saharan populations. Eurasian mt haplogroups appeared in 23% of the Ethiopians and 0.8% of the Ugandans. No Eurasian mt haplogroups were detected for the Zulu and Nama. We identified 13% Ethiopians, 0.5% Ugandan, and 43% Nama/Khoe-Sans with Eurasian Y haplogroups. Eurasian admixture is prevalent in Ethiopia but it is not distributed homogenously. Whereas the Gumuz show no Eurasian haplogroups, the Amhara show the highest frequencies. Within the Nama/Khoe-San there is not a single Eurasian mitochondrial haplogroup but up to 43% of Eurasian Y haplogroups, revealing a strong sex bias (p=1e-12). Consistent with previous reports, the oldest haplogroups are found in highest frequencies within the Khoe-Sans.
Conclusions We present the largest panel of mt and Y chromosome sequences across Africa, including highly diverse Khoe-San populations from South-Africa. Our findings suggest substantial variation in Y chromosome and mt haplogroups across Africa, and provide evidence for extensive Eurasian admixture among several populations across Africa.

Whole-genome sequence analyses provide new insights into the demographic history and local adaptation of African populations.
S. Fan 1, D.E. Kelly 1, M.H. Beltrame 1, M.E.B. Hansen 1, S. Mallick 2,3,4, T. Nyambo 5, S. Omar 6, D. Meskel 7, G. Belay 7, A. Froment 8, N. Patterson 3, D. Reich 2,3,4, S.A. Tishkoff 1,9 . 1) Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; 2) Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; 3) Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; 4) Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA; 5) Department of Biochemistry, Muhimbili University of Health and Allied Sciences, Dares Salaam, Tanzania; 6) Kenya Medical Research Institute, Center for Biotechnology Research and Development, Nairobi, Kenya; 7) Department of Biology, Addis Ababa University, Addis Ababa, Ethiopia; 8) UMR 208, IRD-MNHN, Musée de l'Homme, Paris, France; 9) Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.

Africa is the origin of modern humans within the past 200,000 years. There are more than 2,000 ethnolinguistic groups in Africa, which encompass around one-third of the world’s languages. To infer the complex demographic history of African populations and adaptation to diverse environments, we sequenced the genomes of 94 individuals from 44 indigenous African populations using high coverage Illumina sequencing technology. Phylogenetic analysis confirms that the San lineage is basal to all other modern human population lineages. The location of other African populations in the phylogenetic tree correlates with geographical location, with the exception of the Central Africa rainforest hunter-gatherer (RHG) populations, who group with Southern African populations. We characterize ancient African population structure by inferring the effective population size and divergence time between populations. A common population bottleneck for all African populations was observed at ~200 thousand years ago (kya), corresponding with paleobiological evidence for modern human origins. Since then, the San and RHG populations have maintained the largest effective population size compared to other populations prior to 10 kya. Using MSMC analysis, we infer that the San population split from the RHG and the East African Khoesan-speaking Hadza and Sandawe hunter-gatherers within the past 66-82 kya, suggesting these populations could have originated from a historically more widespread population of hunter-gatherers. By contrast, the San diverged from all non-Khoesan speaking populations ~100-120 kya The divergence times of Niger-Kordofanian, Nilo-Saharan and Afroasiatic speaking populations were within the past ~22 to 41 kya. In the RHG populations, the oldest divergence was found between Eastern and Western RHG at ~36-51 kya; the time of divergence of the western RHG populations was inferred to be ~12-18 kya. Based on the ADMIXTURE analysis, Niger-Kordofanian and RHG populations were pooled for analyses of natural selection. We observed signatures of positive selection at genes involve in muscle development, bone synthesis, reproduction, immune function, energy metabolism, cell signaling, and neural development. 

This work is supported by NIH grants 1R01DK104339-01, 1R01GM113657-01, and DP1 ES022577-04 to SAT. The sequencing was funded by the Simons Foundation (SFARI 280376) and the U.S. National Science Foundation (BCS-1032255) grants to DR.

The Genome Diversity in Africa Project: A deep catalogue of genetic diversity across Africa.
D. Gurdasani 1,2, J.P. Martinez 1, M.O. Pollard 1,2, T. Carstensen 1,2, C. Pomilla 1,2, GDAP Investigators 1,2 . 1) Wellcome Trust Sanger Institute, Cambridge, Cambridgeshire, United Kingdom; 2) Department of Medicine, University of Cambridge, Cambridge.

While recent efforts have greatly extended our understanding of genetic diversity in Africa, current sequence panels are limited in their capture of African genetic variation. Deeper sequencing with sampling of diverse indigenous populations is needed to capture diverse haplotypes across Africa. The Genome Diversity in Africa Project (GDAP) aims to characterise diversity from representative populations across all of Africa, including from several indigenous hunter-gatherer populations across the region. This would provide an important global resource to understand human genetic diversity and provide insight into population history and migrations across Africa in recent times. The project has completed sequencing of 575 samples across 23 populations in Africa, including populations from the Gambia, Ghana, Morocco, South Africa, Sudan, Chad, Kenya, South Africa, Uganda, Egypt and Ethiopia. Here, we present preliminary results from the project on 133 samples from 5 ethno-linguistic groups from Morocco, Ghana (Ashanti), Nigeria (Igbo), Kenya (Kalenjin) and South Africa (Zulu) sequenced on the Hiseq X platform (30x).
Methods Reads were mapped to the GRCh38 reference. Following quality control, variant sites were called using HaplotypeCaller v3.5 for each sample to generate gVCFs. GenotypeGVCFs was run across all samples for joint calling. VCFs were fi ltered using VQSR calibrated on DP, QD, FS, SOR, Read- PosRankSum and MQRankSum annotations. A tranche sensitivity threshold of 99.5% was applied for fi ltering of SNPs and 99% for indels. Only sites called in >90% of individuals were included. Results We identifi ed 25.1M SNPs and 2.9M indels among 133 individuals in the GDAP pilot phase, with 25% and 47% of SNPs and indels being novel (not in dbSNP141), respectively. A large proportion of variants per population were private, varying from 12-18%, being greatest among the Kalenjin and Zulu. We found the highest level of heterozygosity and genetic variation among the Zulu, consistent with reported Khoe- San admixture in this group. Conclusions We present the pilot phase of the Genome Diversity in Africa Project, identifying a high level of diversity across 5 populations from Africa. Inclusion of indigenous population groups, such as the Hadza, Twa Pygmies, and Ju/’hoansi in the next phase will materially advance the understanding of genetic diversity across African populations, and provide an invaluable resource to researchers worldwide. 

High-coverage sequencing of the Human Genome Diversity Project (HGDP-CEPH) Panel.
S. McCarthy 1, A. Anders Bergström 1, Y. Xue 1, Q. Ayub 1, S. Mallick 2,3,4, M. Sandhu 1, D. Reich 2,3,4, R. Durbin 1, C. Tyler-Smith 1 . 1) Wellcome Trust Sanger Institute, Cambridge, United Kingdom; 2) Department of Genetics, Harvard Medical School, Boston, MA; 3) Broad Institute of Harvard and MIT, Cambridge, MA; 4) Howard Hughes Medical Institute, Boston, MA.

We discuss the completion of high coverage (>30x), whole-genome sequencing of all 952 core individuals in the Human Genome Diversity Panel (HGDP-CEPH), with the results being made available as an open access population data resource. This widely used panel contains samples from 52 populations spanning Africa, the Middle East, Europe, Asia, Oceania and the Americas, and previous genotype data from these samples have been an important reference resource for human genetic diversity. As seen in the 1000 Genomes Project, having fully open access data, unencumbered by managed access restrictions and other hurdles, is an invaluable driver for democratized data analysis and methods development Building on previous sequencing efforts by the Simons Genome Diversity Project, we have completed sequencing of the panel and are making the data available via the ENA and the 1000 Genomes Project data management successor, the International Genome Sample Resource (IGSR) ( All data has moved to the new GRCh38 reference and we present preliminary results on the call set derived from this data. We have GATK HaplotypeCaller and fermikit primary calls, are making mpileup and freebayes calls, and will present an integrated call set that has been computationally phased, together with initial population genetic analyses. A small number of samples are being experimentally phased using 10X Genomics technology which will allow evaluation of phasing accuracy, and also unbiased use of haplotype-based analyses such as MSMC.

Fine-scale identity-by-descent and birth records in Finland provide insights into recent population history.
A.R. Martin 1,2, S. Kirminen 3, A.S. Havulinna 4, A. Sarin 3, A. Palotie 1,2,3, V. Salomaa 4, S. Ripatti 3, M. Pirinen 3, M.J. Daly 1,2 . 1) Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; 2) Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; 3) Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland; 4) National Institute for Health and Welfare (THL), Helsinki, Finland.

Finland provides unique opportunities to investigate both population and medical genomics because of its adoption of unprecedented uniformity in national electronic health records, concerted coordination of research centers across the country, detailed historical records, as well as recent population bottlenecks that drove specific disease alleles to high frequency. We investigate recent population history (up to ~50 generations ago), particularly relevant to rare, disease-conferring alleles, using identity-by-descent (IBD) haplotype sharing in >10,000 Finns. We compare IBD sharing in Finland to nearby Scandinavian countries with considerably different population histories, including >8,000 Swedes and >30,000 Danes. We find drastically more sharing on average in Finns, including many long tracts. By leveraging fi ne-scale birth record data, we find a non-linear decay of pairwise IBD sharing with increasing distance across Finland. This arises from pockets of excess IBD sharing; e.g. pairs of individuals from northeast Finland share on average several-fold more of their genome IBD than pairs from southwest regions containing the major cities of Turku and Helsinki. We demonstrate inference of recent migration patterns from IBD sharing patterns. For example, high IBD sharing in northeast Finland radiates from north to south rather than to the west, indicating that migration is restricted near the Russian border. We also investigate recent effective population size changes across regions of Finland and find evidence supporting the distinction between early and late settlement areas. However, our results indicate a more continuous flow of migration than previously posited, with a minimum N e occurring ~12 generations ago in the northernmost Lapland region and moving further back in time to the south, with a bottleneck detectable in the early settlement area ~40 generations ago. Lastly, we leverage IBD sharing for genetic disease mapping and show that rare, functional haplotypes show more significant association via IBD mapping than single variants with linear mixed effect models.

Y-chromosomal composition of mediaeval and contemporary populations in Norway and adjacent Scandinavian countries: Y-STR haplotypes and the rare Y-haplogroup Q. 
B. Berger 1, S. Willuweit 2, H. Niederstätter 1, P. Kralj 1, L. Roewer 2, W. Parson 1,3. 1) Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria; 2) Department of Forensic Genetics, Institute of Legal Medicine and Forensic Sciences, Charité-Universitätsmedizin, Berlin, 13353, Germany; 3) Forensic Science Program, The Pennsylvania State University, PA, USA.

In the framework of the project “Immigration and mobility in mediaeval and post-mediaeval Norway” molecular genetic analyses were performed on 97 pre-modern human remains including genetic sexing and Y-chromosomal DNA typing. All samples were subjected to molecular genetic analyses of the sex using “Genderplex” consisting of two diff erent regions of the amelogenin gene, SRY and four X-STR loci. From 90% of the extracted remains (n=87) sex assignment was possible. Of these, 49 (56.3%) brought a genetically male result. All of these DNA extracts were subjected to Y-STR analysis using Yfiler Plus PCR Amplification Kit (Thermo Fisher Scientifi c) and/or PowerPlex Y23 System (Promega). At least partial Y-STR profiles were obtained from all samples. A detailed comparison between mediaeval/post-mediaeval and contemporary Y-chromosomes was performed by searching the obtained haplotypes (HTs) in the Y Chromosome Haplotype Reference Database (YHRD: comprising 154,329 haplotypes from 991 populations in 129 countries at the time of query (Release 50). YHRD searches of the pre-modern haplotypes yielded full matches plus neighbor-matches differring at only one allele from the query HT. Matches are presented with geographical and ancestry information of the contemporary HTs. For samples without direct YHRD-matches, this information is provided through their neighbor HTs. AMOVA was performed using the YHRD online tool on pairwise R ST values to create the corresponding MDS plots. The pre-modern HTs were grouped according to medieval and post-medieval origin and compared to contemporary populations from Scandinavian (Norwegian, Swedish and Danish), Northwest European, and Northeast European populations. Both pre-modern populations showed small genetic distances to contemporary Scandinavians and larger distances to Northeast Europeans with Northwest European populations in between. As expected, an initial assessment of the Y-chromosomal haplogroups (HGs) showed that most of the samples were attributable to the main European HGs I1, R1a and R1b. However, one of the HTs seemed to be associated with HG-Q which is rare in Europe and hitherto little evaluated in this region. Network analysis was applied for detecting similar HTs in contemporary samples from Norway and adjacent Northern European countries stored in the YHRD. The outcomes of this survey should initiate a detailed SNP based HG-assessment of HG-Q candidate samples.

Evidence for detailed historical European population structure from large-scale, diverse genetic polymorphism data.
P. Carbonetto 1, J. Byrnes 1, J.M. Granka 1, Y. Wang 1, K. Noto 1, E. Han 1, A.R. Kermany 1, K.A. Rand 1, E. Elyashiv 1, H. Guturu 1, N.M. Myres 2, E.L. Hong 1, R.E. Curtis 2, K.G. Chahine 2, C.A. Ball 1. 1) DNA, LLC, San Francisco, CA; 2) DNA, LLC, Lehi, UT.

Despite the recent surge of interest in ancient genomes, we show that there is still much to be elucidated about human demography from contemporary genomes. Here, we demonstrate the use of genealogical data to generate demographic insights from analysis of a large-scale, heterogeneous genetic data set. Specifically, we show that an unsupervised ADMIXTURE analysis of genotypes from 131,293 primarily US-born individuals, followed by a simple statistical analysis of the 3 million pedigree records linked to these genotype samples, yields novel insights into European genetic diversity. In contrast to principal component analysis (PCA), which is the most widely used approach to investigating European genetic diversity, we use ADMIXTURE to infer genetically differentiated source populations reflecting more distant historical time periods. Unsurprisingly, among European-origin individuals, admixture is pervasive. Despite this, our ADMIXTURE analysis with K = 12 ancestral populations identifies 5 stable, genetically differentiated groups within Europe (with putative historical counterparts in parentheses): Ashkenazi Jewish, Irish (Celts), Eastern Europeans (Slavs), Scandinavians (Nordics) and Iberians, featuring Basques and Sardinians. The genealogical data also allow us to provide a detailed portrait of the genetic composition of contemporary peoples across North America (e.g., Iberians in Cuba), and other parts of the world. This work suggests the potential for drawing more detailed connections between present-day and ancient genetic variation by leveraging large, heterogeneous genetic data sets.

Genomic insights into the population structure and history of the Irish Travellers.
E.H. Gilbert 1, S. Carmi 2, S. Ennis 3, J.F. Wilson 4,5, G.L. Cavalleri 1. 1) Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin, Leinster, Ireland; 2) Braun School of Public Health, The Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel; 3) School of Medicine and Medical Science, University College Dublin, Dublin, Ireland; 4) Centre for Global Health Research, Usher Institute for Population Health Sciences and Informatics, University of Edinburgh, Teviot Place, Edinburgh, Scotland; 5) MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, Scotland.

Aims: The Irish Travellers are a population with a history of nomadism. Consanguineous unions are common, and as a population they are socially and genetically isolated from the surrounding, “settled” Irish population. A previous low-resolution genetic analysis suggested a common Irish origin between the settled and the Traveller populations. What is not known, however, is the extent of population structure within the Irish Traveller population, the time of divergence from the general Irish population, and the extent of autozygosity.
Methods: We recruited Irish Travellers from across Ireland and the UK. To be included a participant had to have had at least three grandparents with a surname associated with the Irish Travellers. DNA was extracted from saliva samples, and genotypes were generated using the Illumina OmniExpress SNP genotyping platform. With this data, we investigated population structure using fineStructure, quantifi ed the levels of autozygosity with PLINK, and estimated a time of divergence using a method based on Identity by Descent (IBD) segment identification.
Results: We merged, cleaned, and analysed data from 42 Irish Travellers, 2232 settled Irish, 2039 British, 143 Roma Gypsies, and 931 individuals from 57 world-wide populations. We confirm an Irish origin for the Irish Travellers, demonstrate evidence for population substructure within the population, confirm high levels of autozygosity consistent with a consanguineous population, and for the first time provide estimates for a date of divergence between the Irish Travellers and settled Irish.
Conclusion: Our findings have implications for disease mapping within Ireland, as well as on the social history of the Irish Traveller population.

Personal ancestry inference at the finest scale reveals more sub-structure in the UK.
D. Lawson, G. Weyenburg. Integrative Epidemiology Unit, University of Bristol, Bristol, UK, United Kingdom.

Chromosome Painting has revealed genetic differences within the UK at a very fine scale [1], with structured genetic variation within a single county in some cases (such as Cornwall & South Wales). However, in that work, it was not possible to genetically distinguish much of England, which appeared as a single homogeneous group. Here, we describe an extension to the Fine-STRUCTURE [2] clustering that can further distinguish ancestry even within England; for example, identifying regions such as Norfolk, the Midlands and the South as genetically distinct. The approach works by using the known county locations to craft genetic features to use in unsupervised clustering. Specifically, we group individuals by their geographic sampling location into reference donor populations. This forms an ancestry profile - which can be viewed as a careful choice of feature vector - that still allows unsupervised genetic clustering for all individuals. Further, we describe how this approach allows individuals to be described as an admixture of the inferred geographical clusters. This allows ancestral information to be recovered for individuals who are not purely represented by a single geographical location. This also allows us to characterise the genetic relationship between the inferred clusters, several of which represent drift that is most strongly represented by a particular geographical region (including Cornwall, Wales, Scotland and the North of England) and others of which represent characteristic admixture proportions between these ancestral drifted populations. Beyond improving resolution, this approach facilitates personal genomics because individuals can be represented in terms of the fixed reference panel. We demonstrate the utility of the approach by describing the ancestry of the UK10K participants in terms of the new, high resolution POBI clusters. Previously, a similar analysis [3] without geographical information inferred little population structure in the UK from these samples, but now we have a rich representation of their population structure, including an assessment of admixture from outside the UK. This highlights the value in high quality fine-scale geographic sampling, which could now facilitate this level of ancestry identification for many other countries.

[1] Leslie et al 2015, Nature 519:309–314 [2] Lawson et al 2012, PLoS Genet. 8:e1002453 [3] UK10K Consortium 2015, Nature 526:82-90.

Chromosome painting for arbitrary sample collections.
G. Weyenberg, D. Lawson. Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom.

Haplotype-based methods have been demonstrated to be capable of detecting fine scale structure within human populations—to the point of distinguishing genetic variation at the sub-county level in the South West of England [1]. However, the aforementioned method implements an all-against-all analysis of sampled individuals, which is not suited to all applications, including personal genomics where samples are obtained individually or in small batches. Here, we describe an extension of the FineSTRUCTURE [2] method to allow for painting of individual samples against a panel of pre-calculated reference haplotype clusters, making the method computationally feasible for on-demand analysis of individuals. The choice of the reference panel also allows the user to tailor the analysis to emphasise targeted features of the data. For example, in the context of a personal ancestry imputation, panels may be constructed to focus on global-, continental-, or national-scale genetic features, and the low computational cost of painting an individual against a pre-computed panel makes sample-level exploratory analysis feasible. Another application of the panel-based painting is to use high-quality reference data to impute unknown geographical labels to samples where such information is either unavailable, or was collected at an undesirable resolution. To demonstrate the latter application, we analysed several populations with suspected Northern-European ancestry—including the Hapmap CEU and ASW populations, and the UK10K dataset—with respect to panels of Europeans and the high-resolution People of the British Isles (POBI) samples. These individuals are characterised in terms of an admixture of inferred clusters in the reference populations. Whilst many individuals were best described as a complex admixture that likely occurred over many generations, many others had a clear signal of geographically distinct ancestry.

[1] Leslie et al 2015, Nature 519:309–314 [2] Lawson et al 2012, PLoS Genet. 8:e1002453.

Local ancestry patterns inferred from one million genomes recapitulate fine-scale population history.
Y. Wang 1, K. Noto 1, J. Byrnes 1, R.E. Curtis 2, E. Han 1, E. Eyal 1, G. Harendra 1, P. Carbonetto 1, A.R Kermany 1, J.M. Granka 1, K.A. Rand 1, N.M. Natalie 2, E.L. Hong 1, C.A. Ball 1, K.G. Chahine 2 . 1) DNA, LLC, San Francisco, CA; 2) DNA, LLC, Lehi, UT.

In a country of immigrants, population structure is shaped by a long, ongoing history of immigration, followed by subsequent admixture and migration. All these events have left their footprints in the genomic landscape of current residents and make it possible for geneticists to reconstruct population history from genomic data. However, deciphering the signature of these forces requires accurate inference of genomic tracts that one individual inherits from ancestors of different origins. Previously, several methods have been developed for inferring local ancestry with varying levels of success. Unfortunately, none of these methods can be feasibly applied to a data set of one million genomes. Recently, our team presented Polly, a novel algorithm for estimating genome-wide ancestry proportions in admixed samples. Polly, built on a modified version of the BEAGLE haplotype model, relies on this model to achieve two things: First, to account for phasing uncertainly, and second, to provide a measure of distance between a query haplotype and a reference haplotype. Using haplotype models learned from hundreds of thousands of haplotypes and subsequently annotated with over eight thousand single-origin reference individuals, Polly performs ultra-fast inference of both global and local ancestry. In this study, we evaluate Polly's accuracy in predicting local ancestry using simulated admixed samples with known genomic composition. We assess the assignment accuracy, the switching pattern and the tract length distribution. Using cross-validation experiment, we confirm that Polly makes highly accurate local ancestry estimates even at the subcontinental level. We further use Polly to analyze one million genomes from the United States and discover distinct local ancestry patterns among different ethnic groups and communities, especially among African Americans and Latino Americans. We map local ancestry estimates to individuals’ geographic locations. Our results illustrate clear population structure arising from immigration routes, assortative mating and isolation by distance. We also find evidence that supports large scale domestic migration events, as exemplified by the Great Migration of African Americans following the abolition of slavery. Finally, we attempt to date known historical events from ancestry tract length distributions. Overall, our analysis demonstrates the power of combining local ancestry analysis with big data in studying fine-scale population history.