My Blog List

Friday, August 19, 2011

Project's update: results for participants V157-V247

Here is the latest update of MDL project. For the time being, i have decided to run ADMIXTURE and STRUCTURE analysises simultaneously on the same "LD-pruned" dataset (a subset of the whole MDL dataset with markers corrected for some genetic effects (the admixture coefficients for each participant and references can be downloaded in Excel spreadsheet).
When we look at the global apportionment of ancestry  coefficients in STRUCTURE program, we would expect to see similar but perhaps slightly different patterns than with the ADMIXTURE results. However, the difference between the outcome of those 2 programs was more significant than expected. First of all the STRUCTURE program estimated higher West_European ancestry (especially in samples the from Caucassian region) but the North_East_European and South European estimates were within 10% of one another and overall the estimates were reasonably close in the most of cases (check this spreadsheet for technical details - Fst values, genetic distances etc.). Thus we may conclude that the estimates from the two programs are staistically correlated, while the differences are explained by the different approach to estimating the co-ancestry coefficients. STRUCTURE assumes admixture formed by k ancestral subpopulations, which are defined empirically based on genetic distance between samples in a study set provided by the markers used in the analysis. Not specifying the source or parental populations a priori in this way, the clustering of samples using a variety of k values can be screened and the k that gives the greatest likelihood, and/or the most distinct subpopulation clusters or the most clearly defined mosaic of ancestry affiliations can be selected as best. If adequate AIMs are used, when these methods are run on samples representative for the major continental populations, they tend to identify clusters at low values of k that fall along continental lines, and clusters at higher values of k that fall along ethnic subdivisions defined by other methods such as Principle Components Analysis (Rosenberg et al. 2002)

We started the ADMIXTURE analysis from k=3, resolving Caucassian, North-West European from Asian group. Moving to a more complex K=4 and k=5-population model, the Scandinavian component and the Baltic-North Sea components emerged, resolving  from the Altaic and the Anatolia-Balkanic component  to form a North-West European out group,while both the Altaic and Anatolia-Balkanic groups show little intergroup affinity. The Baltic-North Sea component still show substantial fractional affiliation with the Scandinavian group. Although the k=4 model may seem useful since it neatly provide the most obvious anthropological partitions, the affinity between the Balkanic and two North_European components at least with these markers suggests that this k=4 population model is not what we are looking for either. However, at k= 5, the incidence of fractional affiliations dropped dramatically and for the first time in the analysis of k-value models, we can see the simplest of the clearly partitioned models, breaking the populations used in our project into five groups (Anatolian-Balkanic    Scandinavian    Balto-Slavic    Altaic    Celto-Germanic) that comport with specific major Eurasian metapopulation groups. The results are the same as k=4, except that  the Baltic-North Sea combined group splits now into two new subgroups Balto-Slavic and Celto-Germanics, and the overall incidence of outside-group affiliation was much lower.
In our experiments, higher k-values showed equally clean genetic partitioning but k=5 was the simplest of the clearly partitioned models.

Due to some specific requests, i have also decided to calculate IBS matrix between project's particpants and references (the matrix itself has been visualized in a standard form of a dendrogramm). You also have an opportunity to learn  Mclust clustering of each participant  and look at  the average values of homozygocity and inbreeding for all participants.

1 comment:

  1. I have posted a Root Means Square Comparison Excel 2007 Macro Enabled XLSM spreadsheet for the MDL Project data. I am V173.

    The root-mean-square is used measure of the differences between ID. The smallest RMS number calculated is the closest overall to you.

    If used in any other version please check your original column cells and verify your alternate software is handling sorting and reporting correctly.