Geography of Ancestry: the SPA analysis of the MDLP participants
A team of researchers (Wen-Yun Yang, John Novembre, Eleazar Eskin, Eran Halperin) from Tel Aviv University (TAU) and University of
California, Los Angeles (UCLA) have created a method for more precisely
pinpointing the geographic origin of a person's ancestry by developing an understanding of the spatial diversity of genes. The analysis of diversity of genes within and between populations has
broad applications in studies of human disease and human migrations. The afore-mentioned team of researchers proposed
a new approach, spatial ancestry analysis, explicitly modeling the spatial distribution of each SNP
by assigning an allele frequency as a continuous function in geographic
space.
Although the authors were more concerned with detecting the signals of selective sweeps in human genome, the SPA software implements some interesting features that could be immediately applied to the analysis of genetic data collected in open genome projects.
The most important one is that the explicit modeling of the allele frequency allows individuals to be
localized on the map on the basis of their genetic information alone.
From the original paper on model-based approach for analysis of spatial structure in genetic data:
If the geographic origins of the individuals are known, one can use this information to infer their allele frequency functions at each SNP. However, if locations are not known, our model can infer geographic origins for individuals using only their genetic data, in a manner similar in spirit to PCA-based approaches for spatial assignment.
The experiment
Since the authors have made their software publicly available, I have decided to give SPA software a try. A learning curve was very smooth, because three of five supported formats are in Plink format (with which i am familiar). Actually, the hardest part of experiment with SPA analysis was deciding what to do with the unknown geographic origins of the MDLP participants. Following the hint found in another interesting paper (A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations), i divided the experiment into three parts:
1) first of all, i obtained the geographic coordinates (lats/longs) of each population included in the run.
2) then i carried out SPA analysis with 3 specified dimensions
3) after the SPA analysis was finished, i applied Procrustes analysis
to compare the individual-level coordinates of the first two components
(1 and 2) in the SPA performed on the SNP data (1440447 snps) to the geographic
coordinates
4) using Procrustes analysis, i identified an optimal alignment of the
genetic coordinates to the (Gilbert-projected) geographic
coordinates that involved a rotation of the longitudes and latitudes by 16 counterclockwise.
5) finally, i projected the individual coordinates (which have been previously corrected for the optimal Procrustes alignment) onto the geographic map of Eurasia.
The MDLP participants can find their final geographic coordinates in the corresponding spreadsheet.
The allele frequency gradients and signals of recent positive selection
Another cool feature of SPA software is that it is able to identify loci showing extreme frequency gradients (i.e loci under selection), which does not
require grouping individuals into populations. These are
SNPs that show steep slopes of allele frequency change, with the
consideration that some of these might show extreme gradients because of
the impact of recent positive selection.
The analysis of selective sweeps (as well as their possible implications) belongs to the domain of the molecular biology and medical genetics, and due to the project limitation i am not going to discuss them in all details. I'll limit my discussion by the following observations: the direction gradients of allele frequencies resembles the presupposed genetic flow from East Eurasia to West Eurasia, and from South-Europe to North-Europe. The first two dimensions of SPA capture the main features of variation on the well-known East-West Eurasian cline, while the second and third dimension represent the gene flow from South-Europe to North-Europe.
I've sorted SNPs according to the value of slope function and it appears that the most extreme individual value is detected in rs7568419 - a SNP, which is believed to have linked to a genetically inherited trait. Researchers at 23andMe have identified two genetic variants associated with the trait in people of European ancestry. The C version of rs10953183 is associated with more pronounced chin dimple and the C version of rs7568419 is associated with less of a chin dimple.
A couple of factoids about a cleft chin from Wikipedia:
"This is an inherited trait in humans, where the dominant gene causes the cleft chin while the recessive genotype presents without a cleft. However, it is also a classic example for variable penetrance[5] with environmental factors or a modifier gene possibly affecting the phenotypical expression of the actual genotype. Although cleft chins are seen throughout the world, they are most predominate among people of Germanic and West Slavic (i.e., Polish) ethnicity. It is very common in that part of the world and among descendants of people originating in that part of Europe.[6]It seems particularly prevalent among people living in the former Prussian areas of northern Poland bordering the Baltic Sea."
Those who are interested in more detailed analysis of loci under selection, could find SPA output file in the corresponding spreadsheet (note: a value of slope function in the last column). If you'll find an interesting SNP association with a particular trait, please report your finding to me.
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteShould you need assistance in the repair process for your pool pump, filter, heater, cleaner or anything else, you should call a professional to do this very work as they know how to repair these such parts for better care of your swimming pool.
ReplyDeleteI had Visit your website which was very good..it would be great full to see that such a wild range of products you have
ReplyDeletethanks by Incubators Products Suppliers
Laboratory Equipments
you probably won't contact me, but i was on GEDmatch and saw your page. my kit is A716396 and i would like to know if there is any American Indian genetic matches. So i would love for you to tell me anything and everything you can based on my DNA test. Thank you. Janet
ReplyDeleteHi Janet, I am not the original poster of the blog, just a reader, but I saw this comment is recent and wanted to help ... I don't think the blog poster has personal control over your file on GEDMatch. Have you tried running your file through any of the analysis tools on GEDMatch yourself yet? This one is called MDLP World or something like that. If you have trouble finding how to use the tool on GEDMatch, you could try submitting your .txt file on a website called DNASolves. They are run by a third party group that helps Law Enforcement identify possible distant relatives of unidentified deceased individuals, in the hope of identifying them. When you volunteer your DNA on that site, they give you an analysis based on this tool in return. That site is a little more user-friendly. Hope this helps!
DeleteThis comment has been removed by the author.
Delete