vignettes/LTFHPlusGraphExample.Rmd
LTFHPlusGraphExample.Rmd
Manually constructing family history from registers can be tedious.
It can be easily done for first and second degree relatives, however, it
starts to become very difficult once further degrees of relatives needs
to be identified. Other R packages, such as Kinship2 or
FamAgg, allows users to manipulate pedigrees and perform
certain actions and calculations on them, however, to the best of our
knowledge, they do not allow one to find all relatives of degree \(n\) or closer of a given individual. This
particular problem is core to LT-FH++, as the family members of a
proband (often the genotyped individual) will need to be identified as
well as the family members’ mutual kinship coefficient for the
covariance matrix. With the introduction of the function
prepare_graph()
, we have made it possible to automatically
construct families for a list of probands. Below, we will provide an
example of how the function can be used in connection with the
estimate_liability()
function of the package. All data
is simulated and should not be used as-is in any real-world
analysis.
The input is inspired by register information such as the trio
information provided by the danish CPR register. It usually follows the
format of having a column with the child’s ID (here simply
id
), followed by columns for the mother’s and father’s ID
(here momcol
and dadcol
). Here, we provide an
illustrative example of how the register information may look. The names
used are representative of their relation to a proband named “pid”, with
“pgm” refering to paternal grandmother, “mgf” refering to maternal
grandfather, the “hs” prefix means half-sibling, the new parents of
half-siblings will appear with the “H” or “W” suffix for husband or
wife, cousins are prefaced with the first two letters of their parents,
etc..
Additionally, we will also need threshold information for the
phenotype(s) of interest. The information is baked into the graph object
as attributes, and retrieved as needed later by the
estimate_liability()
function. It is possible to construct
the graph without this information and only construct the desired
families.
# example of register trio input
family = tribble(
~id, ~momcol, ~dadcol,
"pid", "mom", "dad",
"sib", "mom", "dad",
"mhs", "mom", "dad2",
"phs", "mom2", "dad",
"mom", "mgm", "mgf",
"dad", "pgm", "pgf",
"dad2", "pgm2", "pgf2",
"paunt", "pgm", "pgf",
"pacousin", "paunt", "pauntH",
"hspaunt", "pgm", "newpgf",
"hspacousin", "hspaunt", "hspauntH",
"puncle", "pgm", "pgf",
"pucousin", "puncleW", "puncle",
"maunt", "mgm", "mgf",
"macousin", "maunt", "mauntH",
"hsmuncle", "newmgm", "mgf",
"hsmucousin", "hsmuncleW", "hsmuncle"
) %>% print()
## # A tibble: 17 × 3
## id momcol dadcol
## <chr> <chr> <chr>
## 1 pid mom dad
## 2 sib mom dad
## 3 mhs mom dad2
## 4 phs mom2 dad
## 5 mom mgm mgf
## 6 dad pgm pgf
## 7 dad2 pgm2 pgf2
## 8 paunt pgm pgf
## 9 pacousin paunt pauntH
## 10 hspaunt pgm newpgf
## 11 hspacousin hspaunt hspauntH
## 12 puncle pgm pgf
## 13 pucousin puncleW puncle
## 14 maunt mgm mgf
## 15 macousin maunt mauntH
## 16 hsmuncle newmgm mgf
## 17 hsmucousin hsmuncleW hsmuncle
# random thresholds to store as attributes
thrs = tibble(
id = family %>% select(1:3) %>% unlist() %>% unique(),
lower = sample(c(-Inf, 2), size = length(id), replace = TRUE),
upper = sample(c(2, Inf), size = length(id), replace = TRUE)) %>% print()
## # A tibble: 31 × 3
## id lower upper
## <chr> <dbl> <dbl>
## 1 pid -Inf 2
## 2 sib 2 2
## 3 mhs 2 Inf
## 4 phs -Inf 2
## 5 mom 2 Inf
## 6 dad -Inf Inf
## 7 dad2 -Inf 2
## 8 paunt 2 2
## 9 pacousin 2 Inf
## 10 hspaunt -Inf 2
## # ℹ 21 more rows
With the above input, we can construct the graph for the population
specified in family
and attach the information in
thrs
:
graph = prepare_graph(.tbl = family,
thresholds = thrs,
fcol = "dadcol",
mcol = "momcol",
icol = "id")
graph
## IGRAPH 940c4ef DN-- 31 44 --
## + attr: name (v/c), lower (v/n), upper (v/n)
## + edges from 940c4ef (vertex names):
## [1] dad ->pid mom ->pid dad ->sib
## [4] mom ->sib dad2 ->mhs mom ->mhs
## [7] dad ->phs mom2 ->phs mgf ->mom
## [10] mgm ->mom pgf ->dad pgm ->dad
## [13] pgf2 ->dad2 pgm2 ->dad2 pgf ->paunt
## [16] pgm ->paunt pauntH ->pacousin paunt ->pacousin
## [19] newpgf ->hspaunt pgm ->hspaunt hspauntH->hspacousin
## [22] hspaunt ->hspacousin pgf ->puncle pgm ->puncle
## + ... omitted several edges
The object graph
is a directed graph with the
connections specified in the family
object and attributes
(attr) from thrs
. The package igraph can be used
to do any manipulation of the object graph
. The object
graph
only contains a single family in this example,
however, it could just as easily contain the entirety of the danish CPR
register, totalling more than \(10\)
million individuals. We can create a local graph (formally, a
neighborhood sub-graph) around a proband, only including individuals
that are within \(n\) steps of said
proband. The \(n\) steps correspond to
\(n\) degree relatives, such that \(n = 1\) yields all first degree relatives,
\(n = 2\) yields all second degree
relatives, etc.. From this local graph, we can construct a kinship
matrix and the covariance matrix required by
estimate_liability()
.
If we have constructed a graph on the entire Danish CPR register and a subset of these individuals have been genotyped, then we can create local graphs around each of the genotyped individuals. Here, we simply use every individual for the illustration:
genotyped_ids = V(graph)$name
family_graphs = tibble(
pid = genotyped_ids,
fam_graph = make_ego_graph(graph, order = 2, nodes = pid)
)
Here, family_graphs
is a tibble with an id column called
“pid” and a column called “fam_graph” with list of all the local graphs,
centred on the individual from “pid”.
We can construct a kinship matrix for a local graph in the following way:
get_kinship(family_graphs$fam_graph[[1]], h2 = 1, index_id = family_graphs$pid[1])
## pid sib mhs phs mom dad paunt puncle maunt mgm pgm mgf pgf pid_g
## pid 1.00 0.50 0.25 0.25 0.5 0.5 0.25 0.25 0.25 0.25 0.25 0.25 0.25 1.00
## sib 0.50 1.00 0.25 0.25 0.5 0.5 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.50
## mhs 0.25 0.25 1.00 0.00 0.5 0.0 0.00 0.00 0.25 0.25 0.00 0.25 0.00 0.25
## phs 0.25 0.25 0.00 1.00 0.0 0.5 0.25 0.25 0.00 0.00 0.25 0.00 0.25 0.25
## mom 0.50 0.50 0.50 0.00 1.0 0.0 0.00 0.00 0.50 0.50 0.00 0.50 0.00 0.50
## dad 0.50 0.50 0.00 0.50 0.0 1.0 0.50 0.50 0.00 0.00 0.50 0.00 0.50 0.50
## paunt 0.25 0.25 0.00 0.25 0.0 0.5 1.00 0.50 0.00 0.00 0.50 0.00 0.50 0.25
## puncle 0.25 0.25 0.00 0.25 0.0 0.5 0.50 1.00 0.00 0.00 0.50 0.00 0.50 0.25
## maunt 0.25 0.25 0.25 0.00 0.5 0.0 0.00 0.00 1.00 0.50 0.00 0.50 0.00 0.25
## mgm 0.25 0.25 0.25 0.00 0.5 0.0 0.00 0.00 0.50 1.00 0.00 0.00 0.00 0.25
## pgm 0.25 0.25 0.00 0.25 0.0 0.5 0.50 0.50 0.00 0.00 1.00 0.00 0.00 0.25
## mgf 0.25 0.25 0.25 0.00 0.5 0.0 0.00 0.00 0.50 0.00 0.00 1.00 0.00 0.25
## pgf 0.25 0.25 0.00 0.25 0.0 0.5 0.50 0.50 0.00 0.00 0.00 0.00 1.00 0.25
## pid_g 1.00 0.50 0.25 0.25 0.5 0.5 0.25 0.25 0.25 0.25 0.25 0.25 0.25 1.00
Note: individuals such as “hspaunt” are not present, since they are third degree relation.
With the object family_graphs
, we have constructed an
object with the required information and format for estimating the
genetic liabilities of a list of probands. We have automatically
extracted the family present for a given proband up to degree \(2\) (can be generalised to any integer
\(n\)), which allows us to construct
the kinship matrix. Next, we have attached the lower and upper threshold
for each individual during the construction of the graph, and as such
follow the indivdual in whatever local graph they are a part of.
Finally, we need a liability-scale heritability. We will simply use
\(h^2 = 0.5\) for illustrative
purposes. We will use estimate_liability()
in the folling
way:
ltfhpp = estimate_liability(
family_graphs = family_graphs,
h2 = .5,
pid = "pid",
family_graphs_col = "fam_graph"
) %>% print()
## The number of workers is 1
## # A tibble: 31 × 4
## fam_ID pid genetic_est genetic_se
## <chr> <chr> <dbl> <dbl>
## 1 pid pid 1.05 0.00263
## 2 sib sib 1.43 0.00165
## 3 mhs mhs 1.65 0.00197
## 4 phs phs 0.325 0.00347
## 5 mom mom 1.82 0.00179
## 6 dad dad 0.721 0.00306
## 7 dad2 dad2 0.779 0.00280
## 8 paunt paunt 1.23 0.00171
## 9 pacousin pacousin 1.62 0.00193
## 10 hspaunt hspaunt 0.597 0.00334
## # ℹ 21 more rows