simulate_under_LTM
simulates families and thresholds under
the liability threshold model for a given family structure and a
variable number of phenotypes.Please note that it is not possible
to simulate different family structures.
Usage
simulate_under_LTM(
fam_vec = c("m", "f", "s1", "mgm", "mgf", "pgm", "pgf"),
n_fam = NULL,
add_ind = TRUE,
h2 = 0.5,
genetic_corrmat = NULL,
full_corrmat = NULL,
phen_names = NULL,
n_sim = 1000,
pop_prev = 0.1
)
Arguments
- fam_vec
A vector of strings holding the different family members. All family members must be represented by strings from the following list:
m
(Mother)f
(Father)c[0-9]*.[0-9]*
(Children)mgm
(Maternal grandmother)mgf
(Maternal grandfather)pgm
(Paternal grandmother)pgf
(Paternal grandfather)s[0-9]*
(Full siblings)mhs[0-9]*
(Half-siblings - maternal side)phs[0-9]*
(Half-siblings - paternal side)mau[0-9]*
(Aunts/Uncles - maternal side)pau[0-9]*
(Aunts/Uncles - paternal side). Defaults toc("m","f","s1","mgm","mgf","pgm","pgf")
.
- n_fam
A named vector holding the desired number of family members. See
setNames
. All names must be picked from the list mentioned above. Defaults toNULL
.- add_ind
A logical scalar indicating whether the genetic component of the full liability as well as the full liability for the underlying target individual should be included in the covariance matrix. Defaults to
TRUE
.- h2
Either a number or a numeric vector holding the liability-scale heritability(ies) for one or more phenotypes. All entries in
h2
must be non-negative. Note that under the liability threshold model, the heritabilities must also be at most 1. Defaults to 0.5.- genetic_corrmat
Either
NULL
or a numeric matrix holding the genetic correlations between the desired phenotypes. Must be specified, iflength(h2)
\(>0\), and will be ignored ifh2
is a number. All diagonal entries ingenetic_corrmat
must be equal to one, while all off-diagonal entries must be between -1 and 1. In addition, the matrix must be symmetric. Defaults toNULL
.- full_corrmat
Either
NULL
or a numeric matrix holding the full correlations between the desired phenotypes. Must be specified, iflength(h2)
\(>0\), and will be ignored ifh2
is a number. All diagonal entries infull_corrmat
must be equal to one, while all off-diagonal entries must be between -1 and 1. In addition, the matrix must be symmetric. Defaults toNULL
.- phen_names
Either
NULL
or character vector holding the phenotype names. These names will be used to create the row and column names for the covariance matrix. Must be specified, iflength(h2)
\(> 0\), and will be ignored ifh2
is a number. If it is not specified, the names will default to phenotype1, phenotype2, etc. Defaults toNULL
.- n_sim
A positive number representing the number of simulations. Defaults to 1000.
- pop_prev
Either a number or a numeric vector holding the population prevalence(s), i.e. the overall prevalence(s) in the population. All entries in
pop_prev
must be positive and smaller than 1. Defaults to 0.1.
Value
If either fam_vec
or n_fam
is used as the argument,
if it is of the required format, if the liability-scale heritability h2
is a number satisfying \(0 \leq h^2\), n_sim
is a strictly positive number,
and pop_prev
is a positive number that is at most one,
then the output will be a list containing two tibbles.
The first tibble, sim_obs
, holds the simulated liabilities, the disease
status and the current age/age-of-onset for all family members in each of the
n_sim
families.
The second tibble, thresholds
, holds the family identifier, the personal
identifier, the role (specified in fam_vec or n_fam) as well as the lower and
upper thresholds for all individuals in all families. Note that this tibble has
the format required in estimate_liability
.
If either fam_vec
or n_fam
is used as the argument and if it is of the
required format, if genetic_corrmat
and full_corrmat
are two numeric
and symmetric matrices satisfying that all diagonal entries are one and that all
off-diagonal entries are between -1 and 1, if the liability-scale heritabilities in
h2_vec
are numbers satisfying \(0 \leq h^2_i\) for all \(i \in \{1,...,n_pheno\}\),
n_sim
is a strictly positive number, and pop_prev
is a positive numeric
vector such that all entries are at most one, then the output will be a list containing
the following lists.
The first outer list, which is named after the first phenotype in phen_names
,
holds the tibble sim_obs
, which holds the simulated liabilities, the
disease status and the current age/age-of-onset for all family members in each of
the n_sim
families for the first phenotype.
As the first outer list, the second outer list, which is named after the second
phenotype in phen_names
, holds the tibble sim_obs
, which holds
the simulated liabilities, the disease status and the current age/age-of-onset
for all family members in each of the n_sim
families for the second phenotype.
There is a list containing sim_obs
for each phenotype in phen_names
.
The last list entry, thresholds
, holds the family identifier, the personal
identifier, the role (specified in fam_vec or n_fam) as well as the lower and
upper thresholds for all individuals in all families and all phenotypes.
Note that this tibble has the format required in estimate_liability
.
Finally, note that if neither fam_vec
nor n_fam
are specified, the function
returns the disease status, the current age/age-of-onset, the lower and upper
thresholds, as well as the personal identifier for a single individual, namely
the individual under consideration (called o
).
If both fam_vec
and n_fam
are defined, the user is asked to '
decide on which of the two vectors to use.
Details
This function can be used to simulate the case-control status, the current
age and age-of-onset as well as the lower and upper thresholds for
a variable number of phenotypes for all family members in each of
the n_sim
families.
If h2
is a number, simulate_under_LTM
simulates the case-
control status, the current age and age-of-onset as well as thresholds
for a single phenotype.
However, if h2
is a numeric vector, if genetic_corrmat
and
full_corrmat
are two symmetric correlation matrices, and if
phen_names
and pop_prev
are to numeric vectors holding
the phenotype names and the population prevalences, respectively, then
simulate_under_LTM
simulates the case-control status, the current
age and age-of-onset as well as thresholds for two or more (correlated)
phenotypes.
The family members can be specified using one of two possible formats.
Examples
simulate_under_LTM()
#> $sim_obs
#> # A tibble: 1,000 × 26
#> fam_ID g o m f s1 mgm mgf pgm pgf
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 fam_ID… 1.37 2.47 -0.183 0.229 -0.631 -0.995 -0.341 -0.363 0.229
#> 2 fam_ID… 1.03 1.25 1.13 2.69 1.37 -0.425 -0.630 0.0554 0.456
#> 3 fam_ID… 1.33 1.06 0.620 -0.656 1.81 -0.943 0.975 -0.335 1.64
#> 4 fam_ID… -1.40 -0.564 -1.09 -1.76 -3.10 -0.871 0.158 -1.16 -0.635
#> 5 fam_ID… -0.915 -1.11 -0.458 0.0783 -0.845 -0.295 0.529 -1.02 0.396
#> 6 fam_ID… 1.07 0.921 1.34 1.07 2.16 1.47 0.352 0.401 0.0716
#> 7 fam_ID… 0.579 -0.386 -1.12 1.05 1.41 -0.0144 0.0586 -0.00135 -0.986
#> 8 fam_ID… 0.105 -1.27 -0.231 -0.599 0.381 0.855 2.41 -0.759 -0.848
#> 9 fam_ID… -0.0266 1.30 -0.954 1.05 -1.15 -1.07 -1.41 0.0881 -0.283
#> 10 fam_ID… 0.567 0.447 0.361 0.463 -1.00 1.23 2.17 0.870 0.335
#> # ℹ 990 more rows
#> # ℹ 16 more variables: o_status <lgl>, m_status <lgl>, f_status <lgl>,
#> # s1_status <lgl>, mgm_status <lgl>, mgf_status <lgl>, pgm_status <lgl>,
#> # pgf_status <lgl>, o_aoo <dbl>, m_aoo <dbl>, f_aoo <dbl>, s1_aoo <dbl>,
#> # mgm_aoo <dbl>, mgf_aoo <dbl>, pgm_aoo <dbl>, pgf_aoo <dbl>
#>
#> $thresholds
#> # A tibble: 8,000 × 5
#> fam_ID indiv_ID role lower upper
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 fam_ID_1 fam_ID_1_1 o 2.47 2.47
#> 2 fam_ID_2 fam_ID_2_1 o -Inf 3.42
#> 3 fam_ID_3 fam_ID_3_1 o -Inf 3.48
#> 4 fam_ID_4 fam_ID_4_1 o -Inf 3.35
#> 5 fam_ID_5 fam_ID_5_1 o -Inf 2.83
#> 6 fam_ID_6 fam_ID_6_1 o -Inf 3.28
#> 7 fam_ID_7 fam_ID_7_1 o -Inf 3.52
#> 8 fam_ID_8 fam_ID_8_1 o -Inf 2.51
#> 9 fam_ID_9 fam_ID_9_1 o 1.30 1.30
#> 10 fam_ID_10 fam_ID_10_1 o -Inf 3.28
#> # ℹ 7,990 more rows
#>
genetic_corrmat <- matrix(0.4, 3, 3)
diag(genetic_corrmat) <- 1
full_corrmat <- matrix(0.6, 3, 3)
diag(full_corrmat) <- 1
simulate_under_LTM(fam_vec = NULL, n_fam = stats::setNames(c(1,1,1,2,2),
c("m","mgm","mgf","s","mhs")))
#> $sim_obs
#> # A tibble: 1,000 × 26
#> fam_ID g o m mgm mgf s1 s2 mhs1 mhs2
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 fam_ID… 0.157 0.329 0.480 -0.843 -0.496 -0.186 -0.0217 -0.439 -0.507
#> 2 fam_ID… -1.56 -1.24 -1.71 -0.222 -1.46 -0.478 0.175 -1.23 0.176
#> 3 fam_ID… -0.384 -0.0760 -1.77 -1.10 -1.15 -1.59 -0.910 -0.531 -0.386
#> 4 fam_ID… 0.513 0.187 -0.0519 0.292 -0.772 1.40 0.868 -1.20 0.507
#> 5 fam_ID… -1.29 -1.91 -2.06 -1.14 0.0605 0.00188 -0.841 -0.475 -2.38
#> 6 fam_ID… -0.0984 -0.642 -0.802 1.44 -2.50 -1.07 0.933 0.399 -2.62
#> 7 fam_ID… -2.20 -1.99 -1.54 -0.535 -0.358 -1.94 -2.01 -2.52 -1.31
#> 8 fam_ID… -0.323 -0.311 -1.40 -0.208 -0.146 -0.827 1.02 -0.537 -1.07
#> 9 fam_ID… 0.483 3.10 1.20 -0.364 -0.805 0.350 0.550 -0.420 0.235
#> 10 fam_ID… -0.503 -1.27 0.730 1.66 1.91 -0.120 -0.352 0.926 0.465
#> # ℹ 990 more rows
#> # ℹ 16 more variables: o_status <lgl>, m_status <lgl>, mgm_status <lgl>,
#> # mgf_status <lgl>, s1_status <lgl>, s2_status <lgl>, mhs1_status <lgl>,
#> # mhs2_status <lgl>, o_aoo <dbl>, m_aoo <dbl>, mgm_aoo <dbl>, mgf_aoo <dbl>,
#> # s1_aoo <dbl>, s2_aoo <dbl>, mhs1_aoo <dbl>, mhs2_aoo <dbl>
#>
#> $thresholds
#> # A tibble: 8,000 × 5
#> fam_ID indiv_ID role lower upper
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 fam_ID_1 fam_ID_1_1 o -Inf 3.35
#> 2 fam_ID_2 fam_ID_2_1 o -Inf 2.59
#> 3 fam_ID_3 fam_ID_3_1 o -Inf 3.42
#> 4 fam_ID_4 fam_ID_4_1 o -Inf 3.52
#> 5 fam_ID_5 fam_ID_5_1 o -Inf 2.87
#> 6 fam_ID_6 fam_ID_6_1 o -Inf 2.87
#> 7 fam_ID_7 fam_ID_7_1 o -Inf 2.95
#> 8 fam_ID_8 fam_ID_8_1 o -Inf 2.59
#> 9 fam_ID_9 fam_ID_9_1 o 3.10 3.10
#> 10 fam_ID_10 fam_ID_10_1 o -Inf 3.10
#> # ℹ 7,990 more rows
#>
simulate_under_LTM(fam_vec = c("m","f","s1"), n_fam = NULL, add_ind = FALSE,
genetic_corrmat = genetic_corrmat, full_corrmat = full_corrmat, n_sim = 200)
#> $sim_obs
#> # A tibble: 200 × 10
#> fam_ID m f s1 m_status f_status s1_status m_aoo f_aoo s1_aoo
#> <chr> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <dbl> <dbl> <dbl>
#> 1 fam_ID_1 -1.80 -0.155 -0.625 FALSE FALSE FALSE 49 55 26
#> 2 fam_ID_2 -0.384 0.272 -1.45 FALSE FALSE FALSE 46 45 26
#> 3 fam_ID_3 -0.571 -1.07 -0.979 FALSE FALSE FALSE 52 46 22
#> 4 fam_ID_4 1.31 -0.456 -0.0634 TRUE FALSE FALSE 83 46 18
#> 5 fam_ID_5 0.326 0.155 -0.851 FALSE FALSE FALSE 62 55 37
#> 6 fam_ID_6 0.322 0.812 0.264 FALSE FALSE FALSE 53 64 35
#> 7 fam_ID_7 1.40 1.31 1.55 TRUE TRUE TRUE 71 83 63
#> 8 fam_ID_8 1.22 0.179 0.606 FALSE FALSE FALSE 44 39 16
#> 9 fam_ID_9 0.314 1.20 1.79 FALSE FALSE TRUE 39 41 56
#> 10 fam_ID_… -1.05 -0.354 -0.909 FALSE FALSE FALSE 39 38 14
#> # ℹ 190 more rows
#>
#> $thresholds
#> # A tibble: 600 × 5
#> fam_ID indiv_ID role lower upper
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 fam_ID_1 fam_ID_1_1 m -Inf 2.05
#> 2 fam_ID_2 fam_ID_2_1 m -Inf 2.18
#> 3 fam_ID_3 fam_ID_3_1 m -Inf 1.93
#> 4 fam_ID_4 fam_ID_4_1 m 1.31 1.31
#> 5 fam_ID_5 fam_ID_5_1 m -Inf 1.59
#> 6 fam_ID_6 fam_ID_6_1 m -Inf 1.89
#> 7 fam_ID_7 fam_ID_7_1 m 1.41 1.41
#> 8 fam_ID_8 fam_ID_8_1 m -Inf 2.26
#> 9 fam_ID_9 fam_ID_9_1 m -Inf 2.47
#> 10 fam_ID_10 fam_ID_10_1 m -Inf 2.47
#> # ℹ 590 more rows
#>
simulate_under_LTM(fam_vec = c(), n_fam = NULL, add_ind = TRUE, h2 = 0.5,
n_sim = 200, pop_prev = 0.05)
#> Warning: Neither fam_vec nor n_fam is specified...
#> $sim_obs
#> # A tibble: 200 × 5
#> fam_ID g o o_status o_aoo
#> <chr> <dbl> <dbl> <lgl> <dbl>
#> 1 fam_ID_1 0.727 1.12 FALSE 25
#> 2 fam_ID_2 1.07 2.42 TRUE 46
#> 3 fam_ID_3 1.71 1.31 FALSE 39
#> 4 fam_ID_4 -0.257 0.388 FALSE 27
#> 5 fam_ID_5 -1.17 -1.66 FALSE 13
#> 6 fam_ID_6 -0.639 -0.0206 FALSE 15
#> 7 fam_ID_7 0.315 0.115 FALSE 10
#> 8 fam_ID_8 -0.397 -0.277 FALSE 24
#> 9 fam_ID_9 0.556 0.633 FALSE 39
#> 10 fam_ID_10 0.932 1.77 TRUE 69
#> # ℹ 190 more rows
#>
#> $thresholds
#> # A tibble: 200 × 5
#> fam_ID indiv_ID role lower upper
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 fam_ID_1 fam_ID_1_1 o -Inf 3.23
#> 2 fam_ID_2 fam_ID_2_1 o 2.44 2.44
#> 3 fam_ID_3 fam_ID_3_1 o -Inf 2.71
#> 4 fam_ID_4 fam_ID_4_1 o -Inf 3.16
#> 5 fam_ID_5 fam_ID_5_1 o -Inf 3.63
#> 6 fam_ID_6 fam_ID_6_1 o -Inf 3.57
#> 7 fam_ID_7 fam_ID_7_1 o -Inf 3.73
#> 8 fam_ID_8 fam_ID_8_1 o -Inf 3.26
#> 9 fam_ID_9 fam_ID_9_1 o -Inf 2.71
#> 10 fam_ID_10 fam_ID_10_1 o 1.78 1.78
#> # ℹ 190 more rows
#>