Calculate (personalised) thresholds based on CIPs.
Source:R/Prepare_Graph_Input.R
prepare_thresholds.Rd
This function prepares input for estimate_liability
by calculating thresholds based on stratified cumulative incidence proportions (CIPs) with options for interpolation for ages between CIP values. Given a tibble with families and family members and (stratified) CIPs, personalised thresholds will be calculated for each individual present in .tbl
. An individual may be in multiple families, but only once in the same family.
Usage
prepare_thresholds(
.tbl,
CIP,
age_col,
CIP_merge_columns = c("sex", "birth_year", "age"),
CIP_cip_col = "cip",
Kpop = "useMax",
status_col = "status",
lower_equal_upper = FALSE,
personal_thr = FALSE,
fid_col = "fid",
personal_id_col = "pid",
interpolation = NULL,
bst.params = list(max_depth = 10, base_score = 0, nthread = 4, min_child_weight = 10),
min_CIP_value = 1e-05,
xgboost_itr = 30
)
Arguments
- .tbl
Tibble with family and personal id columns, as well as CIP_merge_columns and status.
- CIP
Tibble with population representative cumulative incidence proportions. CIP must contain columns from
CIP_merge_columns
andcIP_cip_col
.- age_col
Name of column with age at the end of follow-up or age at diagnosis for cases.
- CIP_merge_columns
The columns the CIPs are subset by, e.g. CIPs by birth_year, sex.
- CIP_cip_col
Name of column with CIP values.
- Kpop
Takes either "useMax" to use the maximum value in the CIP strata as population prevalence, or a tibble with population prevalence values based on other information. If a tibble is provided, it must contain columns from
.tbl
and a column named "K_pop" with population prevalence values. Defaults to "UseMax".- status_col
Column that contains the status of each family member. Coded as 0 or FALSE (control) and 1 or TRUE (case).
- lower_equal_upper
Should the upper and lower threshold be the same for cases? Can be used if CIPs are detailed, e.g. stratified by birth year and sex.
- personal_thr
Should thresholds be based on stratified CIPs or population prevalence?
- fid_col
Column that contains the family ID.
- personal_id_col
Column that contains the personal ID.
- interpolation
Type of interpolation, defaults to NULL.
- bst.params
List of parameters to pass on to xgboost. See xgboost documentation for details.
- min_CIP_value
Minimum cip value to allow. Too low values may lead to numerical instabilities.
- xgboost_itr
Number of iterations to run xgboost for.
Value
Tibble with (personlised) thresholds for each family member (lower & upper), the calculated cumulative incidence proportion for each individual (K_i), and population prevalence within an individuals CIP strata (K_pop; max value in stratum). The threshold and other potentially relevant information can be added to the family graphs with familywise_attach_attributes
.
Examples
tbl = data.frame(
fid = c(1, 1, 1, 1),
pid = c(1, 2, 3, 4),
role = c("o", "m", "f", "pgf"),
sex = c(1, 0, 1, 1),
status = c(0, 0, 1, 1),
age = c(22, 42, 48, 78),
birth_year = 2023 - c(22, 42, 48, 78),
aoo = c(NA, NA, 43, 45))
cip = data.frame(
age = c(22, 42, 43, 45, 48, 78),
birth_year = c(2001, 1981, 1975, 1945, 1975, 1945),
sex = c(1, 0, 1, 1, 1, 1),
cip = c(0.1, 0.2, 0.3, 0.3, 0.3, 0.4))
prepare_thresholds(.tbl = tbl, CIP = cip, age_col = "age", interpolation = NA)
#> fid pid role sex status age birth_year aoo cip thr lower upper
#> 1 1 1 o 1 0 22 2001 NA 0.1 1.2815516 -Inf 1.2815516
#> 2 1 2 m 0 0 42 1981 NA 0.2 0.8416212 -Inf 0.8416212
#> 3 1 3 f 1 1 48 1975 43 0.3 0.5244005 0.5244005 Inf
#> 4 1 4 pgf 1 1 78 1945 45 0.4 0.2533471 0.2533471 Inf