Sunday, October 14, 2012

ROLLOFF analysis of Poles, Belarusians and Russians from central regions of Russia

A month ago  the notorious Reich lab released an alpha version of ADMIXTOOOLS version 1.0. The alpha version package was developed for in-house use, so the operating routine is not always self-explanatory. The goof thing, however, is that ADMIXTOOLS package maintains full format compatibility with another very well-known EIGENSOFT software program developed by the same lab. This makes the learning curve of ADMIXTOOLS much steeper and flatter. 

The aforementioned package features 6 cool programs, among which i find most useful qp3Pop and rolloff. Due to limitations of this post, i am not going to discuss qp3pop in all details and for the purpose of my presentation it is suffice to say that this program implements three- population (f_3) test for treeness of populations from Reich et al. 2009
Rather i'd suggest reading ADMIXTOOLS supporting material, Dienekes' posts and Reich's paper to get an idea of what f_3 test is about.

ROLOFF method, however, needs a closer look.This is a method that measures time since admixture. It does so by looking at the linkage disequilibrium between SNPs due to admixture.  Now it is time to recall the standard definition of the linkage disequilibrium. The linkage disequilibrium (further -LD) is the nonrandom association between two alleles such that certain combinations are more likely to occur together. As two SNPs get farther apart, we expect there to be less admixture LD.The rate of decline of admixture LD is directly related to the number of generations since admixture, since that indicates how many recombinations have occurred between any two SNPs. In short: Rolloff fits an exponential curve to a plot of admixture LD vs. distance, and uses the rate of exponential decline to calculate the generations since admixture. Given that one generation is roughly equal to 29 years, one can convert the number of generations since admixture into years.

Dienekes has already tested ADMIXTOOLS programs with various worldwide populations, and among other he has carried out f_3 and rolloff analyses of Poles, Lithuanians, and Ukrainians.

Below are words from the man himself:

Using the aforementioned idea, I set out to see whether Lithuanians, who occupy the European end of the Europe-South Asia cline present such a signal of admixture LD. I used the Lithuanian_D sample from the Dodecad Project and the Balochi HGDP sample as reference populations (to calculate allele frequency differences), and the Behar et al. (2010) Lithuanians for admixture LD. There were only ~300k SNPs usuable in this set, but sufficient to detect the signal of admixture LD:
The admixture time estimate is 200.350 +/- 61.608 generations, or 5,810 +/- 1790 years. This is not very precise, probably because of the small number of SNPs and individuals used, but it certainly points to the Neolithic-to-Bronze Age for the occurrence of this admixture. The date is certainly reminiscent of the expansion of the Kurgan culture out of eastern Europe, or, the later Corded Ware culture of northern Europe.

So, it may well appear that at least some of the people participating in these groups of cultures, were indeed influenced by the Indo-Europeans as they expanded from their West Asian homeland. These intruders mixed with eastern Europeans who vacillated during the late Neolithic between a northern Europeoid pole akin to Mesolithic hunter gatherers from
Gotland and Iberia, and a widely dispersed Sardinian-like population that is in evidence at least in the Sweden-Italian Alps-Bulgaria triangle. The gradual appearance of non-mtDNA U related lineages in Siberia and Ukraine is most likely related to this phenomenon.

I have carried out rolloff analysis of my 25-strong Polish_D sample using Lithuanians and Pathans as references:

The signal is fairly distinct, and corresponds to 149.296 +/- 38.783 generations or 4330 +/- 1120 years. I am guessing that either the different reference population (Pathans vs. Balochi), or, more likely the increased number of target individuals (25 vs. 10) have contributed to the narrowing down of the uncertainty. It will be interesting to explore this signal further with more population pairs.


I have used the Yunusbayev et al. sample of Ukrainians, and estimated its admixture time using Lithuanians and Balochi as reference populations: The admixture time estimate is 191.078 +/- 35.079 generations, or 5,540 +/- 1,020 years. It seems very similar to that in Lithuanians, with a smaller standard error, perhaps on account of either the larger number of SNPs or larger number of individuals.

It is tempting to associate this admixture signal with the
Maikop culture which appeared at around this time. Assuming that North_European/West_Asian (or Lithuanian-like and Balochi-like) gene pools existed north and south of the Pontic-Caspian-Caucasus set of geographical barriers, then the Maikop culture which shows links to both the early Transcaucasian culture and those of Eastern Europe would have been an ideal candidate region for the admixture picked up by rolloff to have taken place. There are, of course, other possibilities.

As always, Dienekes' analysis spawned a lot of criticism on behalf of another genome blogger - Davidski from Eurogenes. In his latest post he argued  that it’s difficult to say what this experiment was testing exactly because Pathans aren’t pure West Asians and Lithuanians aren’t pure Mesolithic Europeans. He also claimed that Dienekes' interpretations are  wrong, because f3-statistics and rolloff tests are basically picking up (belated) signals of the Mesolithic and Neolithic peopling of Europe.

Since the aforementioned populations have the strongest presence in my dataset, and there is no consensus-opinion between genome bloggers on how to interpret the ADMIXTOOLS  i've decided to put my 5 cents and  to test ADMIXTOOLS.

For the purposes of this analysis, i created ad-hoc dataset, which includes 750 000 snps samples in 250 worldwide populations.  Next, i made 3*62 000 trios in the following form (X,Y; Z), where X and Y are two paired reference populations, and Z is one of three populations - central Russians, Poles and Belarusians. After that i carried out q3Pop analysis of those trios.

From the obtained results I have picked up only those with significant negative Z-score:


X                   Y                   Z
Estonian    Jew-Iraqi    Polish    -0.002039    0.000179    -11.368
Jew-Iraqi    Estonian    Polish    -0.002039    0.000179    -11.368
Italian-North    Latvian    Polish    -0.001211    0.000109    -11.098
Latvian    Italian-North    Polish    -0.001211    0.000109    -11.098
Estonian    Italian-North    Polish    -0.001023    0.000093    -11.037
Italian-North    Estonian    Polish    -0.001023    0.000093    -11.037
Estonian    Jew-Iran    Polish    -0.001861    0.000172    -10.831
Jew-Iran    Estonian    Polish    -0.001861    0.000172    -10.831
Armenian    Estonian    Polish    -0.001425    0.000136    -10.505
Estonian    Armenian    Polish    -0.001425    0.000136    -10.505
Italian-South    Latvian    Polish    -0.001344    0.000129    -10.458
Latvian    Italian-South    Polish    -0.001344    0.000129    -10.458
Cypriot    Estonian    Polish    -0.001626    0.000161    -10.113
Estonian    Cypriot    Polish    -0.001626    0.000161    -10.113


North_Amerind    Sardinian    Russian_Center    -0.004202    0.000479    -8.779
Sardinian    North_Amerind   Russian_Center    -0.004202    0.000479    -8.779
Basque    Ket    Russian_Center    -0.003771    0.000444    -8.493
Ket    Basque    Russian_Center    -0.003771    0.000444    -8.493
Karitiana    Sardinian    Russian_Center    -0.005947    0.000704    -8.453
Sardinian    Karitiana    Russian_Center    -0.005947    0.000704    -8.453
Pima    Sardinian    Russian_Center    -0.004908    0.000605    -8.117
Sardinian    Pima    Russian_Center    -0.004908    0.000605    -8.117
Ket    Sardinian    Russian_Center    -0.004295    0.000552    -7.786
Sardinian    Ket    Russian_Center    -0.004295    0.000552    -7.786
Lithuanian    Oroqen    Russian_Center    -0.00344    0.000445    -7.731
Oroqen    Lithuanian    Russian_Center    -0.00344    0.000445    -7.731
Basque    Karitiana    Russian_Center    -0.004629    0.000609    -7.596
Karitiana    Basque    Russian_Center    -0.004629    0.000609    -7.596
Basque    Pima    Russian_Center    -0.003711    0.000492    -7.545
Pima    Basque    Russian_Center    -0.003711    0.000492    -7.545
North_Amerind    Sardinian    Russian_Center    -0.003465    0.000462    -7.505
Sardinian    North_Amerind   Russian_Center    -0.003465    0.000462    -7.505
Basque    Nganassan    Russian_Center    -0.003574    0.000478    -7.471


Indian    Polish    Belarusian    -0.000736    0.000251    -2.935
Polish    Indian    Belarusian    -0.000736    0.000251    -2.935
Karitiana    Sardinian    Belarusian    -0.001278    0.000517    -2.471
Sardinian    Karitiana    Belarusian    -0.001278    0.000517    -2.471
Otzi    North_Amerind    Belarusian    -0.002556    0.001126    -2.271
Cirkassian    Polish    Belarusian    -0.000488    0.000231    -2.113
Polish    Cirkassian    Belarusian    -0.000488    0.000231    -2.113
Pima    Otzi    Belarusian    -0.002727    0.00137    -1.99
Pima    Sardinian    Belarusian    -0.000794    0.000431    -1.843
Sardinian    Pima    Belarusian    -0.000794    0.000431    -1.843
Otzi    Surui    Belarusian    -0.002938    0.001931    -1.522
Surui    Otzi    Belarusian    -0.002938    0.001931    -1.522


As it looks at first glance,the results of my ad-hoc experiment with 3qPop seems to be consistent with the findings in Patterson et al.2012 paper: "the most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present day Basques and Sardinians, and the other related to present day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean ‘Iceman’".

Indeed, the admixture in Poles can be shown as the admixture between Neolithic + Mesolithic populations of Europe, Russians/Belarusians can be represented as admixture between the ancestral population of modern populations of NE-Asia/Amerinds and Neolithic populations of Europe.
However, more careful examination of results allows me to reveal the additional signals of admixtures in two of three target populations - Poles and Belarusians.
Although its is perfectly possible to treat Estonians and Latvians as modern day proxies for the NE-populations of  Mesolithic Europe, it is also obvious that these populations could have (at least in theory) the significant genetic legacy related to Baltic branch of Indo-European Corded Ware culture.  On other hand, the second component of admixture in Poles itself is a product of admixture between Near-East/Anatolian-like Neolithic populations and more recent genetic stratum, which is probably related to the massive migration of R1b people (the ancestors of 'Bell beakers') from NE-Asia to Western Europe.

Given that, i'd suggest to rewrite the components of admixtures in Poles in the following manner:

Pole=(Neolithic_populations of Europe)+"Bell Beakerish-like")+(Mesolithic_poplations)+"Corded_Ware" component) [1]

In Belarusians, the sources of signals of additional admixture are less clear and vague.
As was shown earlier, in terms of formal admixture analysis (f3 statistics), Belarusians could be represented as the admixture between Poles and Indian/Cirkassian. The first component of admixture is already known (see above [1]), the second one, according to results, must resemble the component, common to both Indian and Circkassian. From the history textbooks i've learned that the territory of modern Karachay-Cherkessia was occupied in the 1st millenium AD  by the Alans, or the Alani, who were a group of Sarmatian tribes, nomadic pastoralists of the 1st millennium AD who spoke an Eastern Iranian language which derived from Scytho-Sarmatian and which in turn evolved into modern Ossetian. The only currently known most recent ancestral population to modern Alans and modern Indians is Scytho-Sarmatian metapopulation.

Thus, we can re-write the admixture formula for Belarusians in the following manner

Belarusian=((Neolithic_populations of Europe)+"Bell Beakerish-like")+(Mesolithic_poplations)+"Corded_Ware" component)) + Scytho-Sarmatian-like

Now, after long discussion, it is time for the admixture_dating fun!

Admixture dating with ROLLOFF

To estimate the admixture date in Polish population, i used as reference populations Latvian and North-Italian

The admixture date is  119.670+-37.145 generations ago, which corresponds to 3470 +-1077 years before present, or 1510 +- 1077 AD.  The upper limits of our dating for the admixture event seem to overlap with the timescale of Unetice culture. The Bronze Age in Poland, as well as elsewhere in central Europe, begins with the innovative Unetice culture, in existence in Silesia and a part of Greater Poland during the first period of this era, that is from before 2200 to 1600 BC. This settled agricultural society's origins consisted of the conservative traditions inherited from the Corded Ware populations and dynamic elements of the Bell-Beaker people. Significantly, the Unetice people cultivated contacts with the highly developed cultures of the Carpathian Basin, through whom they had trade links with the cultures of early Greece. Their culture also echoed inspiring influence coming all the way from the most highly developed at that time civilizations of the Middle East.

To estimate admixture date in Belarusian population, i used as reference populations  Polish and Indian (note: i also lowered genetic distance threshold in ROLLOFF parameters to reduce noise from more recent admixtures)

As you can see, the signal of admixture is less detectable, and by virtue of that the margins of error in admixture dating are significantly higher than in previous example:   154.158+-87.024 generations ago (or, 4470 +-2523 years before present/2510 -+2523 years AD).