Calculate (personalised) thresholds based on CIPs.
Source:R/Prepare_Graph_Input.R
prepare_thresholds.RdThis function prepares input for estimate_liability by calculating thresholds based on stratified cumulative incidence proportions (CIPs) with options for interpolation for ages between CIP values. Given a tibble with families and family members and (stratified) CIPs, personalised thresholds will be calculated for each individual present in .tbl. An individual may be in multiple families, but only once in the same family.
Usage
prepare_thresholds(
.tbl,
CIP,
age_col,
CIP_merge_columns = c("sex", "birth_year", "age"),
CIP_cip_col = "cip",
Kpop = "useMax",
status_col = "status",
lower_equal_upper = FALSE,
personal_thr = FALSE,
fid_col = "fid",
personal_id_col = "pid",
interpolation = NULL,
bst.params = list(max_depth = 10, base_score = 0, nthread = 4, min_child_weight = 10),
min_CIP_value = 1e-05,
xgboost_itr = 30
)Arguments
- .tbl
Tibble with family and personal id columns, as well as CIP_merge_columns and status.
- CIP
Tibble with population representative cumulative incidence proportions. CIP must contain columns from
CIP_merge_columnsandcIP_cip_col.- age_col
Name of column with age at the end of follow-up or age at diagnosis for cases.
- CIP_merge_columns
The columns the CIPs are subset by, e.g. CIPs by birth_year, sex.
- CIP_cip_col
Name of column with CIP values.
- Kpop
Takes either "useMax" to use the maximum value in the CIP strata as population prevalence, or a tibble with population prevalence values based on other information. If a tibble is provided, it must contain columns from
.tbland a column named "K_pop" with population prevalence values. Defaults to "UseMax".- status_col
Column that contains the status of each family member. Coded as 0 or FALSE (control) and 1 or TRUE (case).
- lower_equal_upper
Should the upper and lower threshold be the same for cases? Can be used if CIPs are detailed, e.g. stratified by birth year and sex.
- personal_thr
Should thresholds be based on stratified CIPs or population prevalence?
- fid_col
Column that contains the family ID.
- personal_id_col
Column that contains the personal ID.
- interpolation
Type of interpolation, defaults to NULL.
- bst.params
List of parameters to pass on to xgboost. See xgboost documentation for details.
- min_CIP_value
Minimum cip value to allow. Too low values may lead to numerical instabilities.
- xgboost_itr
Number of iterations to run xgboost for.
Value
Tibble with (personlised) thresholds for each family member (lower & upper), the calculated cumulative incidence proportion for each individual (K_i), and population prevalence within an individuals CIP strata (K_pop; max value in stratum). The threshold and other potentially relevant information can be added to the family graphs with familywise_attach_attributes.
Examples
tbl = data.frame(
fid = c(1, 1, 1, 1),
pid = c(1, 2, 3, 4),
role = c("o", "m", "f", "pgf"),
sex = c(1, 0, 1, 1),
status = c(0, 0, 1, 1),
age = c(22, 42, 48, 78),
birth_year = 2023 - c(22, 42, 48, 78),
aoo = c(NA, NA, 43, 45))
cip = data.frame(
age = c(22, 42, 43, 45, 48, 78),
birth_year = c(2001, 1981, 1975, 1945, 1975, 1945),
sex = c(1, 0, 1, 1, 1, 1),
cip = c(0.1, 0.2, 0.3, 0.3, 0.3, 0.4))
prepare_thresholds(.tbl = tbl, CIP = cip, age_col = "age", interpolation = NA)
#> fid pid role sex status age birth_year aoo cip thr lower upper
#> 1 1 1 o 1 0 22 2001 NA 0.1 1.2815516 -Inf 1.2815516
#> 2 1 2 m 0 0 42 1981 NA 0.2 0.8416212 -Inf 0.8416212
#> 3 1 3 f 1 1 48 1975 43 0.3 0.5244005 0.5244005 Inf
#> 4 1 4 pgf 1 1 78 1945 45 0.4 0.2533471 0.2533471 Inf