Dienekes was kind enough to share his C++ code for converting the 1000 genomes dataset to PLINK format. Since many participants of our project have Central European and British ancestry i have added the converted dataset of GBR (“British from England and Scotland”), along with Orcadian sample from HGDP project and CEU panel from HapMap project. These samples proved to be useful in the delineation of NWE ancestry in the participants of our project.
I have also added Finns from the 1000genomes project to access the possible North-Western Asian component in the reference set of Russian population.
Here are results of unsupervised and supervised runs of ADMIXTURE on the pruned MDLP datasets. I haven't intentionally labeled the inferred components, since there is still considerable disagreement among genomic bloggers as to what these components might be.
Unsupervised ADMIXTURE run:
Supervised ADMIXTURE run
PS. I have also tried to run STRUCTURE software on the whole MDLP dataset (in STRUCTURE input format circa 145 Mb file), but this trial run have ended in disaster and suspension of one from my LINUX accounts :/ It means that i won't be able to run STRUCTURE analysis for the whole dataset in a couple of months. Instead, i would try to analyse each chromosome separately.
PS.PS. Pat Berge has requested me to publish ADMIXTURE/PCA/MDS data in spreadsheet
ADMIXTURE results
PCA/MDS plot data
PS.PS. Pat Berge has requested me to publish ADMIXTURE/PCA/MDS data in spreadsheet
ADMIXTURE results
PCA/MDS plot data
This is great data. Can you please post the spreadsheet data?
ReplyDelete@bergep
ReplyDeleteI've done what you told me to do, adding links to ADMIXTURE and PCA/MDS data.
Thanks. When viewing the ADMIXTURE spreadsheet I only see the reference populations. Could you please add the participants? Thanks in advance.
ReplyDeleteThe MDS data works fine with both the references and the participants.
Hi bergep,
ReplyDeleteI have fixed labels in spreadsheet.
Please let me know if it doesn't work.