My Blog List

Thursday, May 19, 2011

RHH mapper: results for V158-V165 and V201-V202

RHHmapper allows users to visualize chromosomal mosaicism in admixed individuals descended from two genetically distinct groups.
Chromosomal mosaicism is when different cells within an individual, who has developed from a single fertilized egg, have a different chromosomal makeup. Most commonly there will be some cells with a typical number of chromosomes (46 chromosomes) and other cells with an altered number or structure of chromosomes.

RHH Counter (Rare Heterozygotes and Homozygotes) is a command line application designed to display the chromosomal mosaicism in admixed individuals as parf of two genetically distinct groups. The software also detects "outliers" whose ancestry is different or admixed compared to most other subjects in a dataset.

I wouldn't mind writting the extensive description of this tool, but due to some sort of time deficiency, i would rather pull some words from editor's description. Also, RHHcounter has been already discussed by Davidski in his Eurogenes blog(explaining how it's possible to get the most out of the RHHcounter/RHHmapper data using SPSmart)

Despite its evident limitations (which have been made explicit by certain persons), the method could be useful for detecting both remote signals of ancient admixture events and more recent "geneographic" traces. For the latter type of analysis, we need some heuristic method to extrapolate the size of shared segments in 23andme's Ancestry Finder.

In order to illustarte the concept, I will use Ancestry Finder's data of my own family members:


LKH (VV's mother)

Next, we need to compare RHH mappings, looking for rare hetero- or homozygotes shared between those two family members.

The common RH for VV and LKH is indicated by a blue mark in Chr.4 track. A blue mark indicates that a rare homozygous genotype was observed in this sample across at this position. In most cases, rare homozygotes are less common than rare heterozygotes. However, the presence of rare homozygotes can be informative of the level of admixture in a particular chromosomal region. VV (LKH's son) also have rare heterozygotes on Chr.7 and Chr.13.

The next logical step is to compare detected rare RHH to Ancestry Finder's data. Since the exact bounds of the shared IBD segments (as shown in AF) are uncertain, we must search for the segment (and geneographic data, associated with those segments) in physical locations closest to RHH's location. For instance, in our example one may be tempited to show that RHH can be attributed to the extended genetic pool of Polish, Hungarian, Lithuanian and Czech populations.

The complete set of chromosomal mosaicisms for all project's participants is available for download here.

P.S. More interesting is, however, the combination of rare heterozygotes on Chr.7 (between 110Mb and 120Mb +-10mb) and Chr.13 (between 40Mb and 50Mb +-10Mb) tends to be prevalent (excluding two Romanians) in Uzbeks and Turks. Even more interesting fact is that one of VV' great-great-great-grandfathers came of a noble Tatar family which had been settled in Grand Duchy of Lithuanie before 1528.

Monday, May 16, 2011

The hierarchy of clusters

Hierarchical layouts of clusters

GRR analysis to detect pairwise allele sharing/project's outliers

Parents outlying pairs (n=1) from rgGRR

Mean Sdev ZMean ZSdev FID1 IID1 FID2 IID2 RelMean_M RelMean_SD RelSD_M RelSD_SD PID1 MID1 PID2 MID2 Ped
1.668269 0.470883 19.153252 -20.508307 Orcadian HGDP00794 Orcadian HGDP00801 1.449923 0.011400 0.620135 0.007278 0 0 0 0 parents

Look at GRR's svg presentation here

PCA plots (Eigensoft)




C1-C4 (with alpha filter applied in order to simulate 3D "depth"

Facet plot with free scaled members of each presented populations

Admixture results for V159-V165 and V201-V202

Here are the results of admixture analysis for participants V159-V165 and V201-V202 , compared to reference populations.

One of the most important problems in admixture analysis - how does one choose the reasonable numbers of clusters (K) - is still up in the air. David H. Alexander, John Novembre and Kenneth Lange suggest using ADMIXTURE's cross-validation procedure. A good value of K will exhibit a low cross-validation error compared to other K values. Cross-validation is enabled by simply adding the --cv flag to the ADMIXTURE command line. In this default setting, the crossvalidation procedure will do 10 repetitions, each time holding out 10% of the genotypes at random. The cross-validation error is reported in the screen output, which makes it fairly clear that the bigger the cross-validition error, the less reasonable is appropriate modeling choice of K. 

In our examples, one can easily figure out that  K=5 is more reasonable modeling choice

CV error (K=10): 0.62159 (0.00006)
CV error (K=5): 0.61144 (0.00006)
CV error (K=7): 0.61429 (0.00008)
CV error (K=8): 0.61555 (0.00007)
CV error (K=9): 0.61845 (0.00007)





MDS plot

MDS plot for  N=273

Admixture K=5

Admixture unsupervised run for K=8