Saturday, September 29, 2012

A quick update to SupportMix's Chromosome Painting

A thoughtful reader of our blog has noticed that some of chromosome_paintings (Chr 5 set 9, Chr 7 sets 4,5; Chr 9 set 7, Chr 11 set 4) were missing in the original tar.gz bundle  distributed via Google Data Drive. I've had to re-upload the archive (with missing files), the new location of  the archive is here.

Additional experiment.

In addition to that quick fix, i decided to test the accuracy of SupportMix's chromosome paintings by juxtaposing them over the MDLP-World22 chromosome graphs. Due to time limitations, i used only first 7 chromosomes of my own SNP data. At first, i ran the MDLP-World22 modification of DIYDodecad v2.1 in byseg mode on "windows" of 500 contiguous SNPs along a chromosome, slided  by increments of 50. After that i cut out chromosome paintings of each chromosome from SupportMix's graphic output and aligned them to the scale of corresponding DIYDodecad chromosome graphs:








 After the preliminary evaluation of results, i have mentioned an approximate correlation between the byseg-output MDLP World22-DIYcalculator and SupportMix for two major "components" in my genome (North-East-European and Atlantic-Mediterranean). Moreover, "Near-Eastern segments" (assigned by SupportMix"  partially overlaps with the peaks of "Near-East segments" in DIYDodecad output. However, the situation with the minor components is much less uncertain. The lack of correlation for the minor components could be explained by different factors:

1) DIYDodecad operates on the unphased raw data of genotypes
2) DIYDodecad program doesn't take into consideration genetic distance/recombination
3) last, but not least: small segments may appear more noisier than they are, because there may not be any informative SNPs in a particular region to distinguish between some of the minor ancestral components (Dienekes Pontikos' observation)




UPDATE: At first i thought that it would be a great idea to calculate the index of correlation between 'byseg' output and SupporMix' Tprobs output file. But it seems that the results are not directly comparable - the assignment of segments in 'byseg' is measured in frequencies, while the assignment of segments in SupportMix is expressed by probability of assignment. If someone has a solution to this problem, please let me know.



No comments:

Post a Comment