My Blog List

Tuesday, October 4, 2011

Simulated SNP-populations of MDLP

Yesterday I had set out to repeat  "simulation" experiments with a SNP dataset of my project's dataset, using PLINK's simulation techniques first described (in terms of population genetics) by Dienekes (the analogous experiments were performed by Harappa DNA BGA project and Eurogenes BGA project).
Synthetic "ancestral" populations (Altaic, Anatolian-Balkanian, Balto-Slavic, North-Atlantic, Scandinavian, Volga-Uralic and Celto-Germanic) were simulated using standard PLINK's simulation routine, with each ""synthetic" population including 5 generated synthetic individuals:
plink --simulate wgas1.sim --make-bed --out sim11
plink --simulate wgas2.sim --make-bed --out sim12
etc. ..
In data simulation, we assumed that  each of 7 clusters defined by specific combination of  allele frequencies of c.100000 Snps (obtained from ADMIXTURE K=7 run under unsupervised model) represents one ancestral pupulation.

Since I was interested in PCA loadings of "ancestral" populations, i used Eigensoft for explicit modeling  differences between different components" along continuous axes of variation. The calculated PCA loadings were then visualized as interactive biplot in R-package BiplotGUI  using the following  Biplot's command:

> Biplots(Data = PCA[, -1], groups = [, 1])

Afterwards  i performed three statistical tests on imported PCA loadings: linear regression, circular regression and procrusted analysis.

1 comment:

  1. PCA dont imply ancestry, it selects principal components from the data without "knowing" anything about the age. It can be admix, genetic drift or anything that has some distinct shape.