My Blog List

Friday, September 28, 2012

Geography of Ancestry: the SPA analysis of the MDLP participants

Geography of Ancestry:  the SPA analysis of the MDLP participants

A team of researchers (Wen-Yun Yang, John Novembre, Eleazar Eskin, Eran Halperin) from Tel Aviv University (TAU) and University of California, Los Angeles (UCLA) have created a method for more precisely pinpointing the geographic origin of a person's ancestry by developing an understanding of the spatial diversity of genes. The analysis of  diversity of genes within and between populations has broad applications in studies of human disease and human migrations. The afore-mentioned team of researchers proposed a new approach, spatial ancestry analysis, explicitly modeling the spatial distribution of each SNP by assigning an allele frequency as a continuous function in geographic space.
Although the authors were more concerned with detecting the signals of selective sweeps in human genome, the SPA software implements some interesting features that could be immediately applied to the analysis of genetic data collected in open genome projects.
The most important one is that the explicit modeling of the allele frequency allows individuals to be localized on the map on the basis of their genetic information alone.

From the original paper on model-based approach for analysis of spatial structure in genetic data:

If the geographic origins of the individuals are known, one can use this information to infer their allele frequency functions at each SNP. However, if locations are not known, our model can infer geographic origins for individuals using only their genetic data, in a manner similar in spirit to PCA-based approaches for spatial assignment.

 The experiment

Since the authors have made their software  publicly available, I have decided to give SPA software a try. A learning curve  was very smooth, because three of five supported formats are in Plink format (with which i am familiar). Actually, the hardest part of experiment with SPA analysis was deciding what to do with the unknown geographic origins of the MDLP participants. Following the hint found in another interesting paper (A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations), i divided the experiment into three parts:
1) first of all, i obtained the geographic coordinates (lats/longs) of  each population included in the run.
2) then i carried out  SPA analysis with 3 specified dimensions
3) after the SPA analysis was finished, i applied Procrustes analysis  to compare the individual-level coordinates of the first two components (1 and 2) in the SPA performed on the SNP data (1440447 snps) to the geographic coordinates
4) using Procrustes analysis, i identified an optimal alignment of the genetic coordinates to the (Gilbert-projected) geographic coordinates that involved a rotation of the longitudes and latitudes  by 16 counterclockwise.
5) finally, i projected the individual coordinates (which have been previously corrected for the optimal Procrustes alignment) onto the geographic map of Eurasia.

The MDLP participants can find their final geographic coordinates in the corresponding spreadsheet.

The allele frequency gradients and signals of recent positive selection

Another cool feature of SPA software is that it is able to identify loci showing extreme frequency gradients (i.e loci under selection), which does not require grouping individuals into populations. These are SNPs that show steep slopes of allele frequency change, with the consideration that some of these might show extreme gradients because of the impact of recent positive selection. 

The analysis of selective sweeps (as well as their possible implications) belongs to the domain of the molecular biology and  medical genetics,  and due to the project limitation i am not going to discuss them in all details. I'll limit my discussion by the following observations:   the direction gradients of allele frequencies resembles the presupposed genetic flow from East Eurasia to West Eurasia, and from South-Europe to North-Europe. The first two dimensions of SPA capture the main features of variation on the well-known East-West Eurasian cline, while the second and third dimension represent the gene flow from South-Europe to  North-Europe.


I've sorted SNPs according to the value of slope function and it appears that the most extreme individual value is detected in rs7568419 - a SNP, which is believed to have linked to a genetically inherited trait. Researchers at 23andMe have identified two genetic variants associated with the trait in people of European ancestry. The C version of rs10953183 is associated with more pronounced chin dimple and the C version of rs7568419 is associated with less of a chin dimple.

A couple of factoids about a cleft chin from Wikipedia:
"This is an inherited trait in humans, where the dominant gene causes the cleft chin while the recessive genotype presents without a cleft. However, it is also a classic example for variable penetrance[5] with environmental factors or a modifier gene possibly affecting the phenotypical expression of the actual genotype. Although cleft chins are seen throughout the world, they are most predominate among people of Germanic and West Slavic (i.e., Polish) ethnicity. It is very common in that part of the world and among descendants of people originating in that part of Europe.[6]It seems particularly prevalent among people living in the former Prussian areas of northern Poland bordering the Baltic Sea."

Those who are interested in more detailed analysis of loci under selection, could find SPA output file in the corresponding spreadsheet (note: a value of slope function in the last column). If you'll find an interesting SNP association with a particular trait, please report your finding to me.




  1. This comment has been removed by the author.

  2. This comment has been removed by the author.

  3. Should you need assistance in the repair process for your pool pump, filter, heater, cleaner or anything else, you should call a professional to do this very work as they know how to repair these such parts for better care of your swimming pool.

  4. I had Visit your website which was very would be great full to see that such a wild range of products you have
    thanks by Incubators Products Suppliers
    Laboratory Equipments

  5. I'm 15 years old. I was born with HIV my mother passed away because of the HIV infection And I regret why i never met Dr Itua he could have cured my mum for me because as a single mother it was very hard for my mother I came across Dr itua healing words online about how he cure different disease in different races diseases like HIV/Aids Herpes,Parkison,Asthma,Autism,Copd,Epilepsy,Shingles,Cold Sore,Infertility, Chronic Fatigues Syndrome,Fibromyalgia,Love Spell,Prostate Cancer,Lung Cancer,Glaucoma.,psoriasis, Cataracts,Macular degeneration,Cardiovascular disease,Lung disease.Enlarged prostate,Osteoporosis.Alzheimer's disease,psoriasis,
    Dementia.,Tach Disease,Breast Cancer,Blood Cancer,Colo-Rectal Cancer,Love Spell,Chronic Diarrhea,Ataxia,Arthritis,Amyotrophic Lateral Scoliosis,Fibromyalgia,Fluoroquinolone Toxicity
    Syndrome Fibrodysplasia Ossificans ProgresSclerosis,Weak Erection,Breast Enlargment,Penis Enlargment,Hpv,measles, tetanus, whooping cough, tuberculosis, polio and diphtheria)Diabetes Hepatitis even Cancer I was so excited but frighten at same time because I haven't come across such thing article online then I contacted Dr Itua on Mail . I also chat with him on what's app +2348149277967 he tells me how it works then I tell him I want to proceed I paid him so swiftly Colorado post office I receive my herbal medicine within 4/5 working days he gave me guild lines to follow and here am I living healthy again can imagine how god use men to manifest his works am I writing in all articles online to spread the god work of Dr Itua Herbal Medicine,He's a Great Man.