Skip to contents

In this document, we will present how we can go from trio information to full families that can be used to calculate kinship matrices. By trio information, we specifically mean knowing the id of the child and the id of the child’s mother and father. Kinship matrices are essential when estimating the liabilities with the estimate_liability() function of the package. This addition help with the process of identifying related individuals and subsequent construction of the kinship matrix.

From trio information to graph

The trio information can be used to create extended families manually by first identifying parents, grandparents, great-grandparents, etc.. From there, siblings, aunts and uncles, cousins, etc.. can also be identified. However, this is a tedious process and it is easy to miss family members. We have developed a function that can find all family member that are related of degree nn or closer that does not rely on the tedious process of identifying each family role manually.

Below is an example data set of a family. It contains half-siblings, half-aunts and -uncles, as well as cousins and individuals that have married into the family. An example is mgm meaning maternal grandmother, hspaunt meaning paternal half-aunt, or hsmuncleW meaning maternal half-uncle’s wife.

family = tribble(
  ~id, ~momcol, ~dadcol,
  "pid", "mom", "dad",
  "sib", "mom", "dad",
  "mhs", "mom", "dad2",
  "phs", "mom2", "dad",
  "mom", "mgm", "mgf",
  "dad", "pgm", "pgf",
  "dad2", "pgm2", "pgf2",
  "paunt", "pgm", "pgf",
  "pacousin", "paunt", "pauntH",
  "hspaunt", "pgm", "newpgf",
  "hspacousin", "hspaunt", "hspauntH",
  "puncle", "pgm", "pgf",
  "pucousin", "puncleW", "puncle",
  "maunt", "mgm", "mgf",
  "macousin", "maunt", "mauntH",
  "hsmuncle", "newmgm", "mgf",
  "hsmucousin", "hsmuncleW", "hsmuncle"
)

thrs =  tibble(
 id = family %>% select(1:3) %>% unlist() %>% unique(),
 lower = sample(c(-Inf, 2), size = length(id), replace = TRUE),
 upper = sample(c(2, Inf), size = length(id), replace = TRUE),
 sex = case_when(
   id %in% family$momcol ~ "F",
   id %in% family$dadcol ~ "M",
   TRUE ~ NA)) %>% 
  mutate(sex = sapply(sex, function(x) ifelse(is.na(x), sample(c("M", "F"), 1), x)))

The object family is meant to represent the trio information that can be found in registers. It is possible to have multiple families in the same input data or single individuals with no family links.

graph = prepare_graph(.tbl = family, 
                      node_attributes = thrs,
                      fcol = "dadcol",
                      mcol = "momcol",
                      icol = "id")
graph
## IGRAPH 715124f DN-- 31 44 -- 
## + attr: name (v/c), lower (v/n), upper (v/n), sex (v/c)
## + edges from 715124f (vertex names):
##  [1] dad     ->pid        mom     ->pid        dad     ->sib       
##  [4] mom     ->sib        dad2    ->mhs        mom     ->mhs       
##  [7] dad     ->phs        mom2    ->phs        mgf     ->mom       
## [10] mgm     ->mom        pgf     ->dad        pgm     ->dad       
## [13] pgf2    ->dad2       pgm2    ->dad2       pgf     ->paunt     
## [16] pgm     ->paunt      pauntH  ->pacousin   paunt   ->pacousin  
## [19] newpgf  ->hspaunt    pgm     ->hspaunt    hspauntH->hspacousin
## [22] hspaunt ->hspacousin pgf     ->puncle     pgm     ->puncle    
## + ... omitted several edges

The object graph is a directed graph constructed from the trio information in family and is build using the igraph package. The direction in the graph is from parent to offspring.

From graph to subgraph and kinship matrix

We can construct a kinship matrix from all family members present in family, or we can consider only the family members that are of degree nn. We can identify the family members of degree 22 like this:

# make_ego_graph returns list, even for node input of length 1
fam_graph = make_ego_graph(graph = graph, 
                           order = 2,
                           nodes = "pid")[[1]]
plot(fam_graph, layout = layout_as_tree,
     vertex.size = 27.5,
     vertex.shape = "rectangle",
     vertex.label.cex = .75,
     edge.arrow.size = .3) 

Plot of the identified pedigree. Pedigree plotted with igraph package.

In particular, individuals such as paternal uncle’s child (i.e a cousin, coded as pucousin above) is not present with this relatedness cut-off as such family members are of degree 33.

Calculate kinship matrix

Finally, the kinship matrix can be calculated with get_kinship() (output made nicer with round) in the following way:

# the kinship matrix is multiplied by 100 and rounded for illustrative purposes!
round(get_kinship(fam_graph, h2 = 1, index_id = "pid", add_ind = FALSE) * 100, 2)
##        pid sib mhs phs mom dad paunt puncle maunt mgm pgm mgf pgf
## pid    100  50  25  25  50  50    25     25    25  25  25  25  25
## sib     50 100  25  25  50  50    25     25    25  25  25  25  25
## mhs     25  25 100   0  50   0     0      0    25  25   0  25   0
## phs     25  25   0 100   0  50    25     25     0   0  25   0  25
## mom     50  50  50   0 100   0     0      0    50  50   0  50   0
## dad     50  50   0  50   0 100    50     50     0   0  50   0  50
## paunt   25  25   0  25   0  50   100     50     0   0  50   0  50
## puncle  25  25   0  25   0  50    50    100     0   0  50   0  50
## maunt   25  25  25   0  50   0     0      0   100  50   0  50   0
## mgm     25  25  25   0  50   0     0      0    50 100   0   0   0
## pgm     25  25   0  25   0  50    50     50     0   0 100   0   0
## mgf     25  25  25   0  50   0     0      0    50   0   0 100   0
## pgf     25  25   0  25   0  50    50     50     0   0   0   0 100

A function called graph_to_trio() has been included in the package, which can convert from the graph object back into a trio object. This function is useful if you want to use the functionality of other packages that rely on trio information. One such example is using the plotting functionality of pedigrees in kinship2.

trio = graph_to_trio(graph = fam_graph, fixParents = TRUE)
trio
## # A tibble: 15 × 4
##    id      momid     dadid     sex  
##    <chr>   <chr>     <chr>     <chr>
##  1 pid     "mom"     "dad"     M    
##  2 sib     "mom"     "dad"     F    
##  3 mhs     "mom"     "added_2" M    
##  4 phs     "added_1" "dad"     F    
##  5 mom     "mgm"     "mgf"     F    
##  6 maunt   "mgm"     "mgf"     F    
##  7 dad     "pgm"     "pgf"     M    
##  8 paunt   "pgm"     "pgf"     F    
##  9 puncle  "pgm"     "pgf"     M    
## 10 mgf     ""        ""        M    
## 11 pgf     ""        ""        M    
## 12 mgm     ""        ""        F    
## 13 pgm     ""        ""        F    
## 14 added_1 ""        ""        F    
## 15 added_2 ""        ""        M

which can be used to utilise the powerful plotting tool kit available in the kinship2 package.

pedigree = with(trio,kinship2::pedigree(id = id, dadid = dadid,momid =  momid,sex =  sex))

plot(pedigree)

Plot of the identified pedigree. Pedigree plotted with kinship2 package.