I just wasted 2 past weeks thinking of new method for visualising the scope of IBD segments' "sharedness" between different populations in project. I, personally, consider PLINK-based inference of overlapping IBD segments to be very useful for the project's purposes . PLINK has functions to detect specific segments shared between distantly- related individuals, for discovering long shared IBD segments, one can use GERMLINE algorithm.
As mentioned in PLINK's manual, the --segment option in PLINK command line generates a file plink.segment which has the fields (one row represents one segment between two compared individuals):
The first thing what came in my mind when i was tackling IBD OVERLAP and INDIV file formats, was to read IBD sharing data as graph objects. At least, i have not figured out a better solution and grapho-analytical approach is very straightforward. Suppose, you want to read shared segments file in R, assuming that graphs of sharedness is undirected. Then, R-routine for reading in the file and converting to an igraph object is very simple:
One can also assign uniformly random weights to the edges and set the color edge attribute to “red” for edges with high weight, and find the shortest path between two nodes of graph, etc.
More details about igraph object on Igraph's homepage.
That looks pretty cool.
As mentioned in PLINK's manual, the --segment option in PLINK command line generates a file plink.segment which has the fields (one row represents one segment between two compared individuals):
To get an idea of how IBD segments file may look like, you may want to take a look at an example of Plink segment file, which includes a list of estimated IBD sharing segments between MDL project participants (download link).FID1 Family ID of first individual IID1 Individual ID of first individual FID2 Family ID of second individual IID2 Individual ID of second individual PHE Phenotype concordance: -1,0,1 CHR Chromosome code BP1 Start physical position of segment (bp) BP2 End physical position of segment (bp) SNP1 Start SNP of segment SNP2 End SNP of segment NSNP Number of SNPs in this segment KB Physical length of segment (kb)
The first thing what came in my mind when i was tackling IBD OVERLAP and INDIV file formats, was to read IBD sharing data as graph objects. At least, i have not figured out a better solution and grapho-analytical approach is very straightforward. Suppose, you want to read shared segments file in R, assuming that graphs of sharedness is undirected. Then, R-routine for reading in the file and converting to an igraph object is very simple:
library(igraph)
Given a graph object (read in and transformed from PLINK segment file format) and a set of graph manipulation rules, one can basically do with graphs whatever he or she intends to do. For instance, one can calculate the largest “clique” of graph and create a new graph of that largest clique. A clique is a maximally-connected subgraph in which every vertex connects to every other vertex:segments <- read.csv("IBD.shared.segments", header = F) g <- graph.data.frame(seg, directed = F) ecount(g) //returns number of vertices vcount(g) //returns number of edges
lc <- largest.cliques(g)g.lc <- subgraph(g, lc[[1]])
One can also assign uniformly random weights to the edges and set the color edge attribute to “red” for edges with high weight, and find the shortest path between two nodes of graph, etc.
More details about igraph object on Igraph's homepage.
Please consider the real-life example of graph, representing the amount of sharedness between project participants. What can we learn from the graphical representation of project's "communities" based on the amount of IBD sharedness? In our particular case we can easily identify the patterns of "IBD sharedness" within and between populations. For example, there are two "IBD communities" in Finnish "population sample" in our project (black circles on the graph). These two enigmatic "communities" appear to be "things in themselves" being focused on two sharing "midpoints" in Finnish sample (V151;V156), while V151 and V156 appear to be only Finns linked by means of IBD segment sharing to North_Russians, and, to lesser extent, Balto-Slavic cluster.
That looks pretty cool.
No comments:
Post a Comment