My Blog List

Wednesday, May 25, 2011

Experimental analysis Part I: Supervised ADMIXTURE (K=5) analysis

Meanwhile i keep receiving 23andme and FF raw data file, i decide to perform a quick  ADMIXTURE  supervised analysis and compare the results to LAMP/STRUCTURE output.
In order to make the presentation clear and easy to follow, i will start with the simpliest part of my analysis, which is supervised ADMIXTURE (K=5) analysis. Supervised analysis allows more accurate estimation of the ancestries of the individuals,by specifying the ancestries of the reference individuals.
I did supervised ADMIXTURE analysis by selecting 6 reference populations - Orcadians and Russians (Vologda) from HGDP project; Romanians, Hungarians, Russians (Tver), Lithuanians and Belorussians from public dataset (Behar DM, Yunusbayev B, Metspalu M, Metspalu E et al. The genome-wide structure of the Jewish people. Nature 2010 Jul 8;466(7303):238-42. ). In our particular case, Orcadians represents an abstract proxy for the whole NW European component,  Romanians and Hungarians as proxies for  Central-European component (while Hungarians represent  more specific  Subcarpathian component,  we consider Romanians to have more genetic affinity to SE (Balkan) component). Russians from Vologda define here North-European component  -and finally, ,Lithuanians, Belorussians and Russians from Tver are included to represent the main genetic component in North-Eastern Europe.

Before manipulating with the reference data in Plink, i removed 2 pairs of close relatives (2Orcadians and 2 Hungarians) and 2 Romanians with Roma admixture . Then, I excluded SNPs with missing rates greater than 1% and performed the SNP prunning, based on the variance inflation factor and pairwise genotypic correlation. After the prunning, i included SNPs with MAF >= 0.05  and with maximum 1 missing allele per-person. Then, i performed  LD-based pruning using a window size of 50, a step of 5 and r^2 threshold of 0.3.  After that i had at my disposal the dataset with 121 included individuals and circa 140Kb SNPs.  27 participants of the MDL project were included into the ADMIXTURE run:
Then i performed linkage disequilibrium based pruning using a window size of 50, a step of 5 and r^2 threshold of 0.3:


V158
V157
V160


V161
V162
V163
V164
V165
V201
V202

V166
V167
V168
V169
V170
V171
V172
V173
V174
V175
V176
V177
V178
V179
V180
V181
V182

The results are in Google spreadsheet














3 comments:

  1. Thanks Vadim, very interesting....appreciate all your work.
    If there is a list of IDs pending, pls add myself as #V179:
    Russian Father from St. Petersburg (some Finn influence there) and russian mother from the urals but since she is H5b, not sure where before then....I am surprised at the % hungarian showing here, thought it would be more belarus and grateful for the 48% russian!

    ReplyDelete
  2. Great first ADMIXTURE. As V173, my MtDNA line is H11a2 with GGgrandmother born in Poland.

    ReplyDelete
  3. Here is a Zipped spreadsheet that can be sorted matching a ID with there overall costed percentages. MS Excel 2007 format. Download and extract.

    http://tinyurl.com/MDLPart-ISupervAdmixK5

    ReplyDelete