Various methods for detecting IBD, including
those implemented in the software programs fastIBD, GERMLINE, Chromopainter have been developed
in the past several years using population genotype data from microarray platforms.
Now, next-generation DNA sequencing data is becoming increasingly available, enabling
the comprehensive analysis of genomes, including identifying rare variants. These
sequencing data may provide an opportunity to detect IBD with higher resolution than
previously possible, potentially enabling the detection of disease causing loci that
were previously undetectable with sparser genetic data.
fastIBD is a fast and computationally efficient method for detecting the identity by descent.The fastIBD algorithm starts by sampling a fixed number of haplotype
pairs (four pairs by default) for each individual from the posterior
haplotype distribution. Each sampled haplotype corresponds to a sequence
of hidden Markov model (HMM) states. The fastIBD algorithm searches for
pairs of sampled haplotypes sharing the same sequence of HMM states for
a set of consecutive markers. If the pair of sampled haplotypes belongs
to two distinct individuals, the shared haplotype tract is recorded.
For each pair of individuals, overlapping shared haplotype tracts are
merged, and the merged shared haplotype tract is a mosaic of pairs of
sampled haplotype. The method has been implemented in BEAGLE, a very popular genetic analysis software.
The similar IBD analyses have been already carried out by other genome bloggers.
Dienkes Pontikos has performed different fastIBD analyses for detecting the degree of sharing between various Euroasian and African groups. Davidski of Eurogenes project has also used fastIBD method in his Intra-European chromosome paintings.
Inspired by those analyses i
decided to run IBD sharing analysis of World 22 calculator dataset using robust
and powerful fastIBD software (with increased IBD detection threshold) and ibd2segment script.
I have created ad-hoc subset of various East-European populations by including samples from the following populations:
Mordovian |
Sorb |
Hungarian |
Belarusian |
Tatar |
Lithuanian |
Polish |
Bosnian |
Ukrainian |
Slovakian |
Nogai |
Serbian |
Estonian |
German |
Swedish |
Macedonian |
Latvian |
Moldavian |
Montenegrin |
Bulgarian |
NorthOssetian |
Kazakh |
Slovenian |
Uzbek |
Adygei |
Armenian |
British |
Czech |
Orcadian |
Russian |
Turk |
I have calculated the sum of IBD shared segments (measured in cM - centimorgans). The obtained matrix of pairwise sharing has been visualized in the following heat maps. The populations in the first heat map are clustered according to z-score values: high values indicate a high degree of IBD sharing, while low values indicate a low degree of IBD sharing.
In the second heat map, a tree-like hierarchic grouping of population is tied to the total value of cM in segments shared by two populations in pairwise-sharing.
I've also made some visualizations of IBD sharing for selected East-European populations
UPDATE I: I've uploaded a spreadsheet with IBD pairwise-sharing values to GoogleDrive.
How do you interpret the results? In part of my results it has Near East and West Asia, what countries would that be considered? ygmy -
ReplyDeleteWest-Asian 6.06 Pct
North-European-Mesolithic -
Indo-Tibetan 1.46 Pct
Mesoamerican 6.39 Pct
Arctic-Amerind -
South-America_Amerind 2.21 Pct
Indian 0.33 Pct
North-Siberean -
Atlantic_Mediterranean_Neolithic 31.6 Pct
Samoedic -
Indo-Iranian 2.29 Pct
East-Siberean 0.73 Pct
North-East-European 25.97 Pct
South-African -
North-Amerind 5.44 Pct
Sub-Saharian 7.89 Pct
East-South-Asian -
Near_East 9.38 Pct
Melanesian -
Paleo-Siberian -
Austronesian 0.24 Pct