Skip to contents

This function prepares input for estimate_liability by calculating thresholds based on stratified cumulative incidence proportions (CIPs) with options for interpolation for ages between CIP values. Given a tibble with families and family members and (stratified) CIPs, personalised thresholds will be calculated for each individual present in .tbl. An individual may be in multiple families, but only once in the same family.

Usage

prepare_thresholds(
  .tbl,
  CIP,
  age_col,
  CIP_merge_columns = c("sex", "birth_year", "age"),
  CIP_cip_col = "cip",
  Kpop = "useMax",
  status_col = "status",
  lower_equal_upper = FALSE,
  personal_thr = FALSE,
  fid_col = "fid",
  personal_id_col = "pid",
  interpolation = NULL,
  bst.params = list(max_depth = 10, base_score = 0, nthread = 4, min_child_weight = 10),
  min_CIP_value = 1e-05,
  xgboost_itr = 30
)

Arguments

.tbl

Tibble with family and personal id columns, as well as CIP_merge_columns and status.

CIP

Tibble with population representative cumulative incidence proportions. CIP must contain columns from CIP_merge_columns and cIP_cip_col.

age_col

Name of column with age at the end of follow-up or age at diagnosis for cases.

CIP_merge_columns

The columns the CIPs are subset by, e.g. CIPs by birth_year, sex.

CIP_cip_col

Name of column with CIP values.

Kpop

Takes either "useMax" to use the maximum value in the CIP strata as population prevalence, or a tibble with population prevalence values based on other information. If a tibble is provided, it must contain columns from .tbl and a column named "K_pop" with population prevalence values. Defaults to "UseMax".

status_col

Column that contains the status of each family member. Coded as 0 or FALSE (control) and 1 or TRUE (case).

lower_equal_upper

Should the upper and lower threshold be the same for cases? Can be used if CIPs are detailed, e.g. stratified by birth year and sex.

personal_thr

Should thresholds be based on stratified CIPs or population prevalence?

fid_col

Column that contains the family ID.

personal_id_col

Column that contains the personal ID.

interpolation

Type of interpolation, defaults to NULL.

bst.params

List of parameters to pass on to xgboost. See xgboost documentation for details.

min_CIP_value

Minimum cip value to allow. Too low values may lead to numerical instabilities.

xgboost_itr

Number of iterations to run xgboost for.

Value

Tibble with (personlised) thresholds for each family member (lower & upper), the calculated cumulative incidence proportion for each individual (K_i), and population prevalence within an individuals CIP strata (K_pop; max value in stratum). The threshold and other potentially relevant information can be added to the family graphs with familywise_attach_attributes.

Examples

tbl = data.frame(
fid = c(1, 1, 1, 1),
pid = c(1, 2, 3, 4),
role = c("o", "m", "f", "pgf"),
sex = c(1, 0, 1, 1),
status = c(0, 0, 1, 1),
age = c(22, 42, 48, 78),
birth_year = 2023 - c(22, 42, 48, 78),
aoo = c(NA, NA, 43, 45))

cip = data.frame(
age = c(22, 42, 43, 45, 48, 78),
birth_year = c(2001, 1981, 1975, 1945, 1975, 1945),
sex = c(1, 0, 1, 1, 1, 1),
cip = c(0.1, 0.2, 0.3, 0.3, 0.3, 0.4))

prepare_thresholds(.tbl = tbl, CIP = cip, age_col = "age", interpolation = NA)
#>   fid pid role sex status age birth_year aoo cip       thr     lower     upper
#> 1   1   1    o   1      0  22       2001  NA 0.1 1.2815516      -Inf 1.2815516
#> 2   1   2    m   0      0  42       1981  NA 0.2 0.8416212      -Inf 0.8416212
#> 3   1   3    f   1      1  48       1975  43 0.3 0.5244005 0.5244005       Inf
#> 4   1   4  pgf   1      1  78       1945  45 0.4 0.2533471 0.2533471       Inf