My Blog List

Tuesday, May 3, 2011

Grapho-analytical approach to the visualisation of IBD shared segments

I just wasted 2 past weeks thinking of new method for visualising the scope of IBD segments' "sharedness" between different populations in project. I, personally, consider PLINK-based inference of  overlapping IBD segments to be very useful for the project's purposes . PLINK has functions to detect specific segments shared between distantly- related individuals, for discovering long shared IBD segments, one can use GERMLINE algorithm.


As mentioned in PLINK's manual, the --segment option in PLINK command line generates a file plink.segment  which has the fields (one row represents one segment between two compared individuals):

FID1       Family ID of first individual
     IID1       Individual ID of first individual
     FID2       Family ID of second individual
     IID2       Individual ID of second individual
     PHE        Phenotype concordance: -1,0,1
     CHR        Chromosome code
     BP1        Start physical position of segment (bp)
     BP2        End physical position of segment (bp)
     SNP1       Start SNP of segment
     SNP2       End SNP of segment
     NSNP       Number of SNPs in this segment
     KB         Physical length of segment (kb)
To get an idea of how IBD segments file may look like, you may want to take a look at an example of  Plink segment file, which includes a list of estimated IBD sharing segments between MDL project participants (download link).

The first thing what came in my mind  when  i was tackling  IBD OVERLAP and INDIV file formats, was to read IBD sharing data as graph objects. At least, i have not figured out a better solution and grapho-analytical approach is very straightforward. Suppose, you want to read shared segments file in R, assuming that graphs of sharedness is undirected. Then, R-routine for reading in the file and converting to an igraph object is very simple:

library(igraph)  
segments <- read.csv("IBD.shared.segments", header = F)
g <- graph.data.frame(seg, directed = F)
ecount(g)
//returns number of vertices
vcount(g)
//returns number of edges
Given a graph object (read in and transformed from  PLINK segment file format) and a set of graph manipulation rules, one can basically do with graphs whatever he or she intends to do.  For instance, one can calculate the largest “clique” of graph and create  a new graph of that largest clique. A clique is a maximally-connected subgraph in which every vertex connects to every other vertex:

lc <- largest.cliques(g)
g.lc <- subgraph(g, lc[[1]])

One can also assign uniformly random weights to the edges and set the color edge attribute to “red” for edges with high weight, and  find the shortest path between two nodes of graph, etc.

More details about igraph object on Igraph's homepage.
 
  


Please consider the real-life example of graph, representing the amount of sharedness between project participants. What can we learn from the graphical representation of project's "communities" based on the amount of IBD sharedness? In our particular case we can easily identify the patterns of "IBD sharedness" within and between populations. For example, there are two "IBD communities" in Finnish "population sample" in our project (black circles on the graph). These two enigmatic "communities" appear to be "things in themselves" being focused on two sharing "midpoints" in Finnish sample (V151;V156), while V151 and V156 appear to be only Finns linked by means of IBD segment sharing to North_Russians, and, to lesser extent, Balto-Slavic cluster.

That looks pretty cool.

No comments:

Post a Comment