Magnus Ducatus Lituaniae Project: World-22 dataset: fastIBD analysis

Various methods for detecting IBD, including those implemented in the software programs fastIBD, GERMLINE, Chromopainter have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, including identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data.

fastIBD is a fast and computationally efficient method for detecting the identity by descent.The fastIBD algorithm starts by sampling a fixed number of haplotype pairs (four pairs by default) for each individual from the posterior haplotype distribution. Each sampled haplotype corresponds to a sequence of hidden Markov model (HMM) states. The fastIBD algorithm searches for pairs of sampled haplotypes sharing the same sequence of HMM states for a set of consecutive markers. If the pair of sampled haplotypes belongs to two distinct individuals, the shared haplotype tract is recorded. For each pair of individuals, overlapping shared haplotype tracts are merged, and the merged shared haplotype tract is a mosaic of pairs of sampled haplotype. The method has been implemented in BEAGLE, a very popular genetic analysis software.

The similar IBD analyses have been already carried out by other genome bloggers.

Dienkes Pontikos has performed different fastIBD analyses for detecting the degree of sharing between various Euroasian and African groups. Davidski of Eurogenes project has also used fastIBD method in his Intra-European chromosome paintings.

Inspired by those analyses i decided to run IBD sharing analysis of World 22 calculator dataset using robust and powerful fastIBD software (with increased IBD detection threshold) and ibd2segment script.

I have created ad-hoc subset of various East-European populations by including samples from the following populations:

-->

Mordovian

Sorb

Hungarian

Belarusian

Tatar

Lithuanian

Polish

Bosnian

Ukrainian

Slovakian

Nogai

Serbian

Estonian

German

Swedish

Macedonian

Latvian

Moldavian

Montenegrin

Bulgarian

NorthOssetian

Kazakh

Slovenian

Uzbek

Adygei

Armenian

British

Czech

Orcadian

Russian

Turk

I have calculated the sum of IBD shared segments (measured in cM - centimorgans). The obtained matrix of pairwise sharing has been visualized in the following heat maps. The populations in the first heat map are clustered according to z-score values: high values indicate a high degree of IBD sharing, while low values indicate a low degree of IBD sharing.

In the second heat map, a tree-like hierarchic grouping of population is tied to the total value of cM in segments shared by two populations in pairwise-sharing.

I've also made some visualizations of IBD sharing for selected East-European populations

UPDATE I: I've uploaded a spreadsheet with IBD pairwise-sharing values to GoogleDrive.

1 comment:

Lucy RJune 2, 2021 at 6:29 AM
How do you interpret the results? In part of my results it has Near East and West Asia, what countries would that be considered? ygmy -
West-Asian 6.06 Pct
North-European-Mesolithic -
Indo-Tibetan 1.46 Pct
Mesoamerican 6.39 Pct
Arctic-Amerind -
South-America_Amerind 2.21 Pct
Indian 0.33 Pct
North-Siberean -
Atlantic_Mediterranean_Neolithic 31.6 Pct
Samoedic -
Indo-Iranian 2.29 Pct
East-Siberean 0.73 Pct
North-East-European 25.97 Pct
South-African -
North-Amerind 5.44 Pct
Sub-Saharian 7.89 Pct
East-South-Asian -
Near_East 9.38 Pct
Melanesian -
Paleo-Siberian -
Austronesian 0.24 Pct

Magnus Ducatus Lituaniae Project

My Blog List

Saturday, October 13, 2012

World-22 dataset: fastIBD analysis

1 comment:

Blog Archive

Followers