Saturday, October 13, 2012

World-22 dataset: fastIBD analysis

 Various methods for detecting IBD, including those implemented in the software programs fastIBD, GERMLINE, Chromopainter have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, including identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data. 

fastIBD is a fast and computationally efficient method for detecting the identity by descent.The fastIBD algorithm starts by sampling a fixed number of haplotype pairs (four pairs by default) for each individual from the posterior haplotype distribution. Each sampled haplotype corresponds to a sequence of hidden Markov model (HMM) states. The fastIBD algorithm searches for pairs of sampled haplotypes sharing the same sequence of HMM states for a set of consecutive markers. If the pair of sampled haplotypes belongs to two distinct individuals, the shared haplotype tract is recorded. For each pair of individuals, overlapping shared haplotype tracts are merged, and the merged shared haplotype tract is a mosaic of pairs of sampled haplotype. The method has been implemented in BEAGLE, a very popular genetic analysis software.

The similar IBD analyses  have been already carried out by other genome bloggers.
Dienkes Pontikos has performed different fastIBD analyses  for detecting the degree of sharing between various Euroasian and African groups. Davidski of Eurogenes project has also used fastIBD method in his Intra-European chromosome paintings.

Inspired by those analyses i decided to run IBD sharing analysis of  World 22 calculator dataset using robust and powerful fastIBD software (with increased IBD detection threshold) and ibd2segment script.

I have created ad-hoc subset of various East-European populations by including samples from the following populations:
 
-->
Mordovian
Sorb
Hungarian
Belarusian
Tatar
Lithuanian
Polish
Bosnian
Ukrainian
Slovakian
Nogai
Serbian
Estonian
German
Swedish
Macedonian
Latvian
Moldavian
Montenegrin
Bulgarian
NorthOssetian
Kazakh
Slovenian
Uzbek
Adygei
Armenian
British
Czech
Orcadian
Russian
Turk

 I have calculated  the sum of IBD shared segments (measured in cM - centimorgans). The obtained matrix of pairwise sharing has been visualized in the following heat maps. The populations in the first heat map are clustered according to z-score values: high values indicate a high degree of IBD sharing, while low values indicate a low degree of IBD sharing.  

 

 In the second heat map, a tree-like hierarchic grouping of population is tied to the total value of cM in segments shared by two populations in pairwise-sharing.

I've also made some visualizations of IBD sharing for selected East-European populations




 UPDATE I I've uploaded a spreadsheet with IBD pairwise-sharing values to GoogleDrive.

1 comment:

  1. How do you interpret the results? In part of my results it has Near East and West Asia, what countries would that be considered? ygmy -
    West-Asian 6.06 Pct
    North-European-Mesolithic -
    Indo-Tibetan 1.46 Pct
    Mesoamerican 6.39 Pct
    Arctic-Amerind -
    South-America_Amerind 2.21 Pct
    Indian 0.33 Pct
    North-Siberean -
    Atlantic_Mediterranean_Neolithic 31.6 Pct
    Samoedic -
    Indo-Iranian 2.29 Pct
    East-Siberean 0.73 Pct
    North-East-European 25.97 Pct
    South-African -
    North-Amerind 5.44 Pct
    Sub-Saharian 7.89 Pct
    East-South-Asian -
    Near_East 9.38 Pct
    Melanesian -
    Paleo-Siberian -
    Austronesian 0.24 Pct

    ReplyDelete