Title: | Piece-Wise Exponential Additive Mixed Modeling Tools for Survival Analysis |
---|---|
Description: | The Piece-wise exponential (Additive Mixed) Model (PAMM; Bender and others (2018) <doi: 10.1177/1471082X17748083>) is a powerful model class for the analysis of survival (or time-to-event) data, based on Generalized Additive (Mixed) Models (GA(M)Ms). It offers intuitive specification and robust estimation of complex survival models with stratified baseline hazards, random effects, time-varying effects, time-dependent covariates and cumulative effects (Bender and others (2019)), as well as support for left-truncated, competing risks and recurrent events data. pammtools provides tidy workflow for survival analysis with PAMMs, including data simulation, transformation and other functions for data preprocessing and model post-processing as well as visualization. |
Authors: | Andreas Bender [aut, cre] |
Maintainer: | Andreas Bender <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.01 |
Built: | 2025-01-29 22:20:53 UTC |
Source: | https://github.com/adibender/pammtools |
Add cumulative incidence function to data
add_cif(newdata, object, ...) ## Default S3 method: add_cif( newdata, object, ci = TRUE, overwrite = FALSE, alpha = 0.05, n_sim = 500L, cause_var = "cause", time_var = NULL, ... )
add_cif(newdata, object, ...) ## Default S3 method: add_cif( newdata, object, ci = TRUE, overwrite = FALSE, alpha = 0.05, n_sim = 500L, cause_var = "cause", time_var = NULL, ... )
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
object |
a fitted |
... |
Further arguments passed to |
ci |
|
overwrite |
Should hazard columns be overwritten if already present in
the data set? Defaults to |
alpha |
The alpha level for confidence/credible intervals. |
n_sim |
Number of simulations (draws from posterior of estimated coefficients) on which estimation of CIFs and their confidence/credible intervals will be based on. |
cause_var |
Character. Column name of the 'cause' variable. |
time_var |
Name of the variable used for the baseline hazard. If
not given, defaults to |
Add (cumulative) hazard based on the provided data set and model.
If ci=TRUE
confidence intervals (CI) are also added. Their width can
be controlled via the se_mult
argument. The method by which the
CI are calculated can be specified by ci_type
.
This is a wrapper around
predict.gam
. When reference
is specified, the
(log-)hazard ratio is calculated.
add_hazard(newdata, object, ...) ## Default S3 method: add_hazard( newdata, object, reference = NULL, type = c("response", "link"), ci = TRUE, se_mult = 2, ci_type = c("default", "delta", "sim"), overwrite = FALSE, time_var = NULL, ... ) add_cumu_hazard( newdata, object, ci = TRUE, se_mult = 2, overwrite = FALSE, time_var = NULL, interval_length = "intlen", ... )
add_hazard(newdata, object, ...) ## Default S3 method: add_hazard( newdata, object, reference = NULL, type = c("response", "link"), ci = TRUE, se_mult = 2, ci_type = c("default", "delta", "sim"), overwrite = FALSE, time_var = NULL, ... ) add_cumu_hazard( newdata, object, ci = TRUE, se_mult = 2, overwrite = FALSE, time_var = NULL, interval_length = "intlen", ... )
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
object |
a fitted |
... |
Further arguments passed to |
reference |
A data frame with number of rows equal to |
type |
Either |
ci |
|
se_mult |
Factor by which standard errors are multiplied for calculating the confidence intervals. |
ci_type |
The method by which standard errors/confidence intervals
will be calculated. Default transforms the linear predictor at
respective intervals. |
overwrite |
Should hazard columns be overwritten if already present in
the data set? Defaults to |
time_var |
Name of the variable used for the baseline hazard. If
not given, defaults to |
interval_length |
The variable in newdata containing the interval lengths.
Can be either bare unquoted variable name or character. Defaults to |
ped <- tumor[1:50,] %>% as_ped(Surv(days, status)~ age) pam <- mgcv::gam(ped_status ~ s(tend)+age, data = ped, family=poisson(), offset=offset) ped_info(ped) %>% add_hazard(pam, type="link") ped_info(ped) %>% add_hazard(pam, type = "response") ped_info(ped) %>% add_cumu_hazard(pam)
ped <- tumor[1:50,] %>% as_ped(Surv(days, status)~ age) pam <- mgcv::gam(ped_status ~ s(tend)+age, data = ped, family=poisson(), offset=offset) ped_info(ped) %>% add_hazard(pam, type="link") ped_info(ped) %>% add_hazard(pam, type = "response") ped_info(ped) %>% add_cumu_hazard(pam)
Given suitable data (i.e. data with all columns used for estimation of the model),
this functions adds a column surv_prob
containing survival probabilities
for the specified covariate and follow-up information (and CIs
surv_lower
, surv_upper
if ci=TRUE
).
add_surv_prob( newdata, object, ci = TRUE, se_mult = 2, overwrite = FALSE, time_var = NULL, interval_length = "intlen", ... )
add_surv_prob( newdata, object, ci = TRUE, se_mult = 2, overwrite = FALSE, time_var = NULL, interval_length = "intlen", ... )
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
object |
a fitted |
ci |
|
se_mult |
Factor by which standard errors are multiplied for calculating the confidence intervals. |
overwrite |
Should hazard columns be overwritten if already present in
the data set? Defaults to |
time_var |
Name of the variable used for the baseline hazard. If
not given, defaults to |
interval_length |
The variable in newdata containing the interval lengths.
Can be either bare unquoted variable name or character. Defaults to |
... |
Further arguments passed to |
ped <- tumor[1:50,] %>% as_ped(Surv(days, status)~ age) pam <- mgcv::gam(ped_status ~ s(tend)+age, data=ped, family=poisson(), offset=offset) ped_info(ped) %>% add_surv_prob(pam, ci=TRUE)
ped <- tumor[1:50,] %>% as_ped(Surv(days, status)~ age) pam <- mgcv::gam(ped_status ~ s(tend)+age, data=ped, family=poisson(), offset=offset) ped_info(ped) %>% add_surv_prob(pam, ci=TRUE)
Given a data set in standard format (with one row per subject/observation),
this function adds a column with the specified exposure time points
and a column with respective exposures, created from rng_fun
.
This function should usually only be used to create data sets passed
to sim_pexp
.
add_tdc(data, tz, rng_fun, ...)
add_tdc(data, tz, rng_fun, ...)
data |
A data set with variables specified in |
tz |
A numeric vector of exposure times (relative to the
beginning of the follow-up time |
rng_fun |
A random number generating function that creates
the time-dependent covariates at time points |
... |
Currently not used. |
Adds the contribution of a specific term to the
linear predictor to the data specified by newdata
.
Essentially a wrapper to predict.gam
, with type="terms"
.
Thus most arguments and their documentation below is from predict.gam
.
add_term(newdata, object, term, reference = NULL, ci = TRUE, se_mult = 2, ...)
add_term(newdata, object, term, reference = NULL, ci = TRUE, se_mult = 2, ...)
newdata |
A data frame or list containing the values of the model covariates at which predictions
are required. If this is not provided then predictions corresponding to the
original data are returned. If |
object |
a fitted |
term |
A character (vector) or regular expression indicating for which term(s) information should be extracted and added to data set. |
reference |
A data frame with number of rows equal to |
ci |
|
se_mult |
The factor by which standard errors are multiplied to form confidence intervals. |
... |
Further arguments passed to |
library(ggplot2) ped <- as_ped(tumor, Surv(days, status)~ age, cut = seq(0, 2000, by = 100)) pam <- mgcv::gam(ped_status ~ s(tend) + s(age), family = poisson(), offset = offset, data = ped) #term contribution for sequence of ages s_age <- ped %>% make_newdata(age = seq_range(age, 50)) %>% add_term(pam, term = "age") ggplot(s_age, aes(x = age, y = fit)) + geom_line() + geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = .3) # term contribution relative to mean age s_age2 <- ped %>% make_newdata(age = seq_range(age, 50)) %>% add_term(pam, term = "age", reference = list(age = mean(.$age))) ggplot(s_age2, aes(x = age, y = fit)) + geom_line() + geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = .3)
library(ggplot2) ped <- as_ped(tumor, Surv(days, status)~ age, cut = seq(0, 2000, by = 100)) pam <- mgcv::gam(ped_status ~ s(tend) + s(age), family = poisson(), offset = offset, data = ped) #term contribution for sequence of ages s_age <- ped %>% make_newdata(age = seq_range(age, 50)) %>% add_term(pam, term = "age") ggplot(s_age, aes(x = age, y = fit)) + geom_line() + geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = .3) # term contribution relative to mean age s_age2 <- ped %>% make_newdata(age = seq_range(age, 50)) %>% add_term(pam, term = "age", reference = list(age = mean(.$age))) ggplot(s_age2, aes(x = age, y = fit)) + geom_line() + geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = .3)
Aas.data.frame
S3 method for objects of class crps
.
## S3 method for class 'crps' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
## S3 method for class 'crps' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
An object of class |
row.names |
|
optional |
logical. If |
... |
additional arguments to be passed to or from methods. |
patient
data set.This data set contains the time-dependent covariates (TDCs) for the patient
data set. Note that nutrition was protocoled for at most 12 days after
ICU admission. The data set includes:
Unique patient identifier. Can be used to merge with
patient
data
The calendar (!) day at which calories (or proteins) were administered
The percentage of target calories supplied to the patient by the ICU staff
The amount of protein supplied to the patient by the ICU staff
daily
daily
An object of class tbl_df
(inherits from tbl
, data.frame
) with 18797 rows and 4 columns.
geom_hazard
is an extension of the geom_line
, and
is optimized for (cumulative) hazard plots. Essentially, it adds a (0,0)
row to the data, if not already the case. Stolen from the
RmcdrPlugin.KMggplot2
(slightly modified).
geom_hazard( mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) geom_stephazard( mapping = NULL, data = NULL, stat = "identity", position = "identity", direction = "vh", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) geom_surv( mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... )
geom_hazard( mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) geom_stephazard( mapping = NULL, data = NULL, stat = "identity", position = "identity", direction = "vh", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... ) geom_surv( mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
The statistical transformation to use on the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
... |
Other arguments passed on to
|
direction |
direction of stairs: 'vh' for vertical then horizontal, 'hv' for horizontal then vertical, or 'mid' for step half-way between adjacent x-values. |
library(ggplot2) library(pammtools) ped <- tumor[10:50,] %>% as_ped(Surv(days, status)~1) pam <- mgcv::gam(ped_status ~ s(tend), data=ped, family = poisson(), offset = offset) ndf <- make_newdata(ped, tend = unique(tend)) %>% add_hazard(pam) # piece-wise constant hazards ggplot(ndf, aes(x = tend, y = hazard)) + geom_vline(xintercept = c(0, ndf$tend[c(1, (nrow(ndf)-2):nrow(ndf))]), lty = 3) + geom_hline(yintercept = c(ndf$hazard[1:3], ndf$hazard[nrow(ndf)]), lty = 3) + geom_stephazard() + geom_step(col=2) + geom_step(col=2, lty = 2, direction="vh") # comulative hazard ndf <- ndf %>% add_cumu_hazard(pam) ggplot(ndf, aes(x = tend, y = cumu_hazard)) + geom_hazard() + geom_line(col=2) # doesn't start at (0, 0) # survival probability ndf <- ndf %>% add_surv_prob(pam) ggplot(ndf, aes(x = tend, y = surv_prob)) + geom_surv() + geom_line(col=2) # doesn't start at c(0,1)
library(ggplot2) library(pammtools) ped <- tumor[10:50,] %>% as_ped(Surv(days, status)~1) pam <- mgcv::gam(ped_status ~ s(tend), data=ped, family = poisson(), offset = offset) ndf <- make_newdata(ped, tend = unique(tend)) %>% add_hazard(pam) # piece-wise constant hazards ggplot(ndf, aes(x = tend, y = hazard)) + geom_vline(xintercept = c(0, ndf$tend[c(1, (nrow(ndf)-2):nrow(ndf))]), lty = 3) + geom_hline(yintercept = c(ndf$hazard[1:3], ndf$hazard[nrow(ndf)]), lty = 3) + geom_stephazard() + geom_step(col=2) + geom_step(col=2, lty = 2, direction="vh") # comulative hazard ndf <- ndf %>% add_cumu_hazard(pam) ggplot(ndf, aes(x = tend, y = cumu_hazard)) + geom_hazard() + geom_line(col=2) # doesn't start at (0, 0) # survival probability ndf <- ndf %>% add_surv_prob(pam) ggplot(ndf, aes(x = tend, y = surv_prob)) + geom_surv() + geom_line(col=2) # doesn't start at c(0,1)
geom_stepribbon
is an extension of the geom_ribbon
, and
is optimized for Kaplan-Meier plots with pointwise confidence intervals
or a confidence band. The default direction
-argument "hv"
is
appropriate for right-continuous step functions like the hazard rates etc
returned by pammtools
.
geom_stepribbon( mapping = NULL, data = NULL, stat = "identity", position = "identity", direction = "hv", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... )
geom_stepribbon( mapping = NULL, data = NULL, stat = "identity", position = "identity", direction = "hv", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ... )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
The statistical transformation to use on the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
direction |
direction of stairs: 'vh' for vertical then horizontal, 'hv' for horizontal then vertical, or 'mid' for step half-way between adjacent x-values. |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
... |
Other arguments passed on to
|
geom_ribbon
geom_stepribbon
inherits from geom_ribbon
.
library(ggplot2) huron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron)) h <- ggplot(huron, aes(year)) h + geom_stepribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") + geom_step(aes(y = level)) h + geom_ribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") + geom_line(aes(y = level))
library(ggplot2) huron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron)) h <- ggplot(huron, aes(year)) h + geom_stepribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") + geom_step(aes(y = level)) h + geom_ribbon(aes(ymin = level - 1, ymax = level + 1), fill = "grey70") + geom_line(aes(y = level))
These functions are designed to extract (or mimic) the cumulative coefficients
usually used in additive hazards models (Aalen model) to depict (time-varying)
covariate effects. For PAMMs, these are the differences
between the cumulative hazard rates where all covariates except one have the
identical values. For a numeric covariate of interest, this calculates
. For non-numeric covariates
the cumulative hazard of the reference level is subtracted from
the cumulative hazards evaluated at all non reference levels. Standard
errors are calculated using the delta method.
get_cumu_coef(model, data = NULL, terms, ...) ## S3 method for class 'gam' get_cumu_coef(model, data, terms, ...) ## S3 method for class 'aalen' get_cumu_coef(model, data = NULL, terms, ci = TRUE, ...) ## S3 method for class 'cox.aalen' get_cumu_coef(model, data = NULL, terms, ci = TRUE, ...)
get_cumu_coef(model, data = NULL, terms, ...) ## S3 method for class 'gam' get_cumu_coef(model, data, terms, ...) ## S3 method for class 'aalen' get_cumu_coef(model, data = NULL, terms, ci = TRUE, ...) ## S3 method for class 'cox.aalen' get_cumu_coef(model, data = NULL, terms, ci = TRUE, ...)
model |
Object from which to extract cumulative coefficients. |
data |
Additional data if necessary. |
terms |
A character vector of variables for which the cumulative coefficient should be calculated. |
... |
Further arguments passed to methods. |
ci |
Logical. Indicates if confidence intervals should be returned as well. |
Calculate (or plot) cumulative effect for all time-points of the follow-up
get_cumu_eff(data, model, term, z1, z2 = NULL, se_mult = 2) gg_cumu_eff(data, model, term, z1, z2 = NULL, se_mult = 2, ci = TRUE)
get_cumu_eff(data, model, term, z1, z2 = NULL, se_mult = 2) gg_cumu_eff(data, model, term, z1, z2 = NULL, se_mult = 2, ci = TRUE)
data |
Data used to fit the |
model |
A suitable model object which will be used to estimate the
partial effect of |
term |
A character string indicating the model term for which partial effects should be plotted. |
z1 |
The exposure profile for which to calculate the cumulative effect. Can be either a single number or a vector of same length as unique observation time points. |
z2 |
If provided, calculated cumulative effect is for the difference between the two exposure profiles (g(z1,t)-g(z2,t)). |
se_mult |
Multiplicative factor used to calculate confidence intervals (e.g., lower = fit - 2*se). |
ci |
Logical. Indicates if confidence intervals for the |
Information on intervals in which times fall
get_intervals(x, times, ...) ## Default S3 method: get_intervals(x, times, left.open = TRUE, rightmost.closed = TRUE, ...)
get_intervals(x, times, ...) ## Default S3 method: get_intervals(x, times, left.open = TRUE, rightmost.closed = TRUE, ...)
x |
An object from which interval information can be obtained,
see |
times |
A vector of times for which corresponding interval information should be returned. |
... |
Further arguments passed to |
left.open |
logical; if true all the intervals are open at left
and closed at right; in the formulas below, |
rightmost.closed |
logical; if true, the rightmost interval,
|
A data.frame
containing information on intervals in which
values of times
fall.
set.seed(111018) brks <- c(0, 4.5, 5, 10, 30) int_info(brks) x <- runif (3, 0, 30) x get_intervals(brks, x)
set.seed(111018) brks <- c(0, 4.5, 5, 10, 30) int_info(brks) x <- runif (3, 0, 30) x get_intervals(brks, x)
Constructs lag-lead window data set from raw inputs or from data objects
with suitable information stored in attributes, e.g., objects created
by as_ped
.
get_laglead(x, ...) ## Default S3 method: get_laglead(x, tz, ll_fun, ...) ## S3 method for class 'data.frame' get_laglead(x, ...)
get_laglead(x, ...) ## Default S3 method: get_laglead(x, tz, ll_fun, ...) ## S3 method for class 'data.frame' get_laglead(x, ...)
x |
Either a numeric vector of follow-up cut points or a suitable object. |
... |
Further arguments passed to methods. |
tz |
A vector of exposure times |
ll_fun |
Function that specifies how the lag-lead matrix should be constructed. First argument is the follow up time second argument is the time of exposure. |
get_laglead(0:10, tz=-5:5, ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3}) gg_laglead(0:10, tz=-5:5, ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3})
get_laglead(0:10, tz=-5:5, ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3}) gg_laglead(0:10, tz=-5:5, ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3})
Given a mgcv
gamObject
, returns the information
used for the default plots produced by plot.gam
.
get_plotinfo(x, ...)
get_plotinfo(x, ...)
x |
a fitted |
... |
Further arguments passed to |
This function basically creates a new df
from data
for
each term in terms
, creating a range from minimum and maximum of the
predict(fit, newdata=df, type="terms")
. Terms are then
stacked to a tidy data frame.
get_terms(data, fit, terms, ...)
get_terms(data, fit, terms, ...)
data |
A data frame containing variables used to fit the model. Only first row will be used. |
fit |
A fitted object of class |
terms |
A character vector (can be length one). Specifies the terms for which partial effects will be returned |
... |
Further arguments passed to |
A tibble with 5 columns.
library(survival) fit <- coxph(Surv(time, status) ~ pspline(karno) + pspline(age), data=veteran) terms_df <- veteran %>% get_terms(fit, terms = c("karno", "age")) head(terms_df) tail(terms_df)
library(survival) fit <- coxph(Surv(time, status) ~ pspline(karno) + pspline(age), data=veteran) terms_df <- veteran %>% get_terms(fit, terms = c("karno", "age")) head(terms_df) tail(terms_df)
Given a model object, returns a data frame with columns variable
,
coef
(coefficient), ci_lower
(lower 95\
ci_upper
(upper 95\
gg_fixed(x, intercept = FALSE, ...)
gg_fixed(x, intercept = FALSE, ...)
x |
A model object. |
intercept |
Logical, indicating whether intercept term should be included.
Defaults to |
... |
Currently not used. |
g <- mgcv::gam(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris) gg_fixed(g, intercept=TRUE) gg_fixed(g)
g <- mgcv::gam(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=iris) gg_fixed(g, intercept=TRUE) gg_fixed(g)
Given data defining a Lag-lead window, returns respective plot as a
ggplot2
object.
gg_laglead(x, ...) ## Default S3 method: gg_laglead(x, tz, ll_fun, ...) ## S3 method for class 'LL_df' gg_laglead( x, high_col = "grey20", low_col = "whitesmoke", grid_col = "lightgrey", ... ) ## S3 method for class 'nested_fdf' gg_laglead(x, ...)
gg_laglead(x, ...) ## Default S3 method: gg_laglead(x, tz, ll_fun, ...) ## S3 method for class 'LL_df' gg_laglead( x, high_col = "grey20", low_col = "whitesmoke", grid_col = "lightgrey", ... ) ## S3 method for class 'nested_fdf' gg_laglead(x, ...)
x |
Either a numeric vector of follow-up cut points or a suitable object. |
... |
Further arguments passed to methods. |
tz |
A vector of exposure times |
ll_fun |
Function that specifies how the lag-lead matrix should be constructed. First argument is the follow up time second argument is the time of exposure. |
high_col |
Color used to highlight exposure times within the lag-lead window. |
low_col |
Color of exposure times outside the lag-lead window. |
grid_col |
Color of grid lines. |
get_laglead
## Example 1: supply t, tz, ll_fun directly gg_laglead(1:10, tz=-5:5, ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3}) ## Example 2: extract information on t, tz, ll_from data with respective attributes data("simdf_elra", package = "pammtools") gg_laglead(simdf_elra)
## Example 1: supply t, tz, ll_fun directly gg_laglead(1:10, tz=-5:5, ll_fun=function(t, tz) { t >= tz + 2 & t <= tz + 2 + 3}) ## Example 2: extract information on t, tz, ll_from data with respective attributes data("simdf_elra", package = "pammtools") gg_laglead(simdf_elra)
Depending on the plot function and input, creates either a 1-dimensional slices, bivariate surface or (1D) cumulative effect.
gg_partial(data, model, term, ..., reference = NULL, ci = TRUE) gg_partial_ll( data, model, term, ..., reference = NULL, ci = FALSE, time_var = "tend" ) get_partial_ll( data, model, term, ..., reference = NULL, ci = FALSE, time_var = "tend" )
gg_partial(data, model, term, ..., reference = NULL, ci = TRUE) gg_partial_ll( data, model, term, ..., reference = NULL, ci = FALSE, time_var = "tend" ) get_partial_ll( data, model, term, ..., reference = NULL, ci = FALSE, time_var = "tend" )
data |
Data used to fit the |
model |
A suitable model object which will be used to estimate the
partial effect of |
term |
A character string indicating the model term for which partial effects should be plotted. |
... |
Covariate specifications (expressions) that will be evaluated
by looking for variables in |
reference |
If specified, should be a list with covariate value pairs,
e.g. |
ci |
Logical. Indicates if confidence intervals for the |
time_var |
The name of the variable that was used in |
Plot Normal QQ plots for random effects
gg_re(x, ...)
gg_re(x, ...)
x |
a fitted |
... |
Further arguments passed to |
library(pammtools) data("patient") ped <- patient %>% dplyr::slice(1:100) %>% as_ped(Surv(Survdays, PatientDied)~ ApacheIIScore + CombinedicuID, id="CombinedID") pam <- mgcv::gam(ped_status ~ s(tend) + ApacheIIScore + s(CombinedicuID, bs="re"), data=ped, family=poisson(), offset=offset) gg_re(pam) plot(pam, select = 2)
library(pammtools) data("patient") ped <- patient %>% dplyr::slice(1:100) %>% as_ped(Surv(Survdays, PatientDied)~ ApacheIIScore + CombinedicuID, id="CombinedID") pam <- mgcv::gam(ped_status ~ s(tend) + ApacheIIScore + s(CombinedicuID, bs="re"), data=ped, family=poisson(), offset=offset) gg_re(pam) plot(pam, select = 2)
Flexible, high-level plotting function for (non-linear) effects conditional on further covariate specifications and potentially relative to a comparison specification.
gg_slice(data, model, term, ..., reference = NULL, ci = TRUE)
gg_slice(data, model, term, ..., reference = NULL, ci = TRUE)
data |
Data used to fit the |
model |
A suitable model object which will be used to estimate the
partial effect of |
term |
A character string indicating the model term for which partial effects should be plotted. |
... |
Covariate specifications (expressions) that will be evaluated
by looking for variables in |
reference |
If specified, should be a list with covariate value pairs,
e.g. |
ci |
Logical. Indicates if confidence intervals for the |
ped <- tumor[1:200, ] %>% as_ped(Surv(days, status) ~ . ) model <- mgcv::gam(ped_status~s(tend) + s(age, by = complications), data=ped, family = poisson(), offset=offset) make_newdata(ped, age = seq_range(age, 20), complications = levels(complications)) gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications)) gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications), ci = FALSE) gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications), reference=list(age = 50))
ped <- tumor[1:200, ] %>% as_ped(Surv(days, status) ~ . ) model <- mgcv::gam(ped_status~s(tend) + s(age, by = complications), data=ped, family = poisson(), offset=offset) make_newdata(ped, age = seq_range(age, 20), complications = levels(complications)) gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications)) gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications), ci = FALSE) gg_slice(ped, model, "age", age=seq_range(age, 20), complications=levels(complications), reference=list(age = 50))
Given a gam model this convenience function returns a plot of all smooth terms contained in the model. If more than one smooth is present, the different smooth are faceted.
gg_smooth(x, ...) ## Default S3 method: gg_smooth(x, fit, ...)
gg_smooth(x, ...) ## Default S3 method: gg_smooth(x, fit, ...)
x |
A data frame or object of class |
... |
Further arguments passed to |
fit |
A model object. |
A ggplot
object.
get_terms
g1 <- mgcv::gam(Sepal.Length ~ s(Sepal.Width) + s(Petal.Length), data=iris) gg_smooth(iris, g1, terms=c("Sepal.Width", "Petal.Length"))
g1 <- mgcv::gam(Sepal.Length ~ s(Sepal.Width) + s(Petal.Length), data=iris) gg_smooth(iris, g1, terms=c("Sepal.Width", "Petal.Length"))
Given a gam model this convenience function returns a ggplot2
object
depicting 2d smooth terms specified in the model as heat/contour plots. If
more than one 2d smooth term is present individual terms are faceted.
gg_tensor(x, ci = FALSE, ...)
gg_tensor(x, ci = FALSE, ...)
x |
a fitted |
ci |
A logical value indicating whether confidence intervals should be
calculated and returned. Defaults to |
... |
Further arguments passed to |
g <- mgcv::gam(Sepal.Length ~ te(Sepal.Width, Petal.Length), data=iris) gg_tensor(g) gg_tensor(g, ci=TRUE) gg_tensor(update(g, .~. + te(Petal.Width, Petal.Length)))
g <- mgcv::gam(Sepal.Length ~ te(Sepal.Width, Petal.Length), data=iris) gg_tensor(g) gg_tensor(g, ci=TRUE) gg_tensor(update(g, .~. + te(Petal.Width, Petal.Length)))
This functions provides a flexible interface to create a data set that
can be plugged in as newdata
argument to a suitable predict
function (or similar).
The function is particularly useful in combination with one of the
add_*
functions, e.g., add_term
,
add_hazard
, etc.
make_newdata(x, ...) ## Default S3 method: make_newdata(x, ...) ## S3 method for class 'ped' make_newdata(x, ...) ## S3 method for class 'fped' make_newdata(x, ...)
make_newdata(x, ...) ## Default S3 method: make_newdata(x, ...) ## S3 method for class 'ped' make_newdata(x, ...) ## S3 method for class 'fped' make_newdata(x, ...)
x |
A data frame (or object that inherits from |
... |
Covariate specifications (expressions) that will be evaluated
by looking for variables in |
Depending on the type of variables in x
, mean or modus values
will be used for variables not specified in ellipsis
(see also sample_info
). If x
is an object
that inherits from class ped
, useful data set completion will be
attempted depending on variables specified in ellipsis. This is especially
useful, when creating a data set with different time points, e.g. to
calculate survival probabilities over time (add_surv_prob
)
or to calculate a time-varying covariate effects (add_term
).
To do so, the time variable has to be specified in ...
, e.g.,
tend = seq_range(tend, 20)
. The problem with this specification is that
not all values produced by seq_range(tend, 20)
will be actual values
of tend
used at the stage of estimation (and in general, it will
often be tedious to specify exact tend
values). make_newdata
therefore finds the correct interval and sets tend
to the respective
interval endpoint. For example, if the intervals of the PED object are
then
tend = 1.5
will be set to 2
and the
remaining time-varying information (e.g. offset) completed accordingly.
See examples below.
# General functionality tumor %>% make_newdata() tumor %>% make_newdata(age=c(50)) tumor %>% make_newdata(days=seq_range(days, 3), age=c(50, 55)) tumor %>% make_newdata(days=seq_range(days, 3), status=unique(status), age=c(50, 55)) # mean/modus values of unspecified variables are calculated over whole data tumor %>% make_newdata(sex=unique(sex)) tumor %>% group_by(sex) %>% make_newdata() # Examples for PED data ped <- tumor %>% slice(1:3) %>% as_ped(Surv(days, status)~., cut = c(0, 500, 1000)) ped %>% make_newdata(age=c(50, 55)) # if time information is specified, other time variables will be specified # accordingly and offset calculated correctly ped %>% make_newdata(tend = c(1000), age = c(50, 55)) ped %>% make_newdata(tend = unique(tend)) ped %>% group_by(sex) %>% make_newdata(tend = unique(tend)) # tend is set to the end point of respective interval: ped <- tumor %>% as_ped(Surv(days, status)~.) seq_range(ped$tend, 3) make_newdata(ped, tend = seq_range(tend, 3))
# General functionality tumor %>% make_newdata() tumor %>% make_newdata(age=c(50)) tumor %>% make_newdata(days=seq_range(days, 3), age=c(50, 55)) tumor %>% make_newdata(days=seq_range(days, 3), status=unique(status), age=c(50, 55)) # mean/modus values of unspecified variables are calculated over whole data tumor %>% make_newdata(sex=unique(sex)) tumor %>% group_by(sex) %>% make_newdata() # Examples for PED data ped <- tumor %>% slice(1:3) %>% as_ped(Surv(days, status)~., cut = c(0, 500, 1000)) ped %>% make_newdata(age=c(50, 55)) # if time information is specified, other time variables will be specified # accordingly and offset calculated correctly ped %>% make_newdata(tend = c(1000), age = c(50, 55)) ped %>% make_newdata(tend = unique(tend)) ped %>% group_by(sex) %>% make_newdata(tend = unique(tend)) # tend is set to the end point of respective interval: ped <- tumor %>% as_ped(Surv(days, status)~.) seq_range(ped$tend, 3) make_newdata(ped, tend = seq_range(tend, 3))
pammtools
provides functions and utilities that facilitate fitting
Piece-wise Exponential Additive Mixed Models (PAMMs), including data
transformation and other convenience functions for pre- and post-processing
as well as plotting.
The best way to get an overview of the functionality provided and how to fit PAMMs is to view the vignettes available at https://adibender.github.io/pammtools/articles/. A summary of the vignettes' content is given below:
basics: Introduction to PAMMs and basic modeling.
baseline: Shows how to estimate and visualize baseline model (without covariates) and comparison to respective Cox-PH model.
convenience: Convenience functions for post-processing and plotting PAMMs.
data-transformation: Transforming data into a format suitable to fit PAMMs.
frailty: Specifying "frailty" terms, i.e., random effects for PAMMs.
splines: Specifying spline smooth terms for PAMMs.
strata: Specifying stratified models in which each level of a grouping variable has a different baseline hazard.
tdcovar: Dealing with time-dependent covariates.
tveffects: Specifying time-varying effects.
left-truncation: Estimation for left-truncated data.
competing-risks: Competing risks analysis.
Maintainer: Andreas Bender [email protected] (ORCID)
Authors:
Fabian Scheipl [email protected] (ORCID)
Philipp Kopper [email protected] (ORCID)
Other contributors:
Lukas Burk [email protected] (ORCID) [contributor]
Bender, Andreas, Andreas Groll, and Fabian Scheipl. 2018. “A Generalized Additive Model Approach to Time-to-Event Analysis” Statistical Modelling, February. https://doi.org/10.1177/1471082X17748083.
Bender, Andreas, Fabian Scheipl, Wolfgang Hartl, Andrew G. Day, and Helmut Küchenhoff. 2019. “Penalized Estimation of Complex, Non-Linear Exposure-Lag-Response Associations.” Biostatistics 20 (2): 315–31. https://doi.org/10.1093/biostatistics/kxy003.
Bender, Andreas, and Fabian Scheipl. 2018. “pammtools: Piece-Wise Exponential Additive Mixed Modeling Tools.” ArXiv:1806.01042 Stat, June. https://arxiv.org/abs/1806.01042.
Useful links:
A data set containing the survival time (or hospital release time) among other covariates. The full data is available here. The following variables are provided:
The year of ICU Admission
Intensive Care Unit (ICU) ID
Patient identificator
Survival time of patients. Here it is assumed that patients survive until t=30 if released from hospital.
Status indicator; 1=death, 0=censoring
Survival time in hospital. Here it is assumed that patients are censored at time of hospital release (potentially informative)
Male or female
The patients age at Admission
Admission category: medical, surgical elective or surgical emergency
The patient's Apache II Score at Admission
Patient's Body Mass Index
Diagnosis at admission in 9 categories
patient
patient
An object of class data.frame
with 2000 rows and 12 columns.
Given an object of class ped
, returns data frame with one row for each
interval containing interval information, mean values for numerical
variables and modus for non-numeric variables in the data set.
ped_info(ped) ## S3 method for class 'ped' ped_info(ped)
ped_info(ped) ## S3 method for class 'ped' ped_info(ped)
ped |
An object of class |
A data frame with one row for each unique interval in ped
.
ped <- tumor[1:4,] %>% as_ped(Surv(days, status)~ sex + age) ped_info(ped)
ped <- tumor[1:4,] %>% as_ped(Surv(days, status)~ sex + age) ped_info(ped)
S3 method for pamm objects for compatibility with package pec
## S3 method for class 'pamm' predictSurvProb(object, newdata, times, ...)
## S3 method for class 'pamm' predictSurvProb(object, newdata, times, ...)
object |
A fitted model from which to extract predicted survival probabilities |
newdata |
A data frame containing predictor variable combinations for which to compute predicted survival probabilities. |
times |
A vector of times in the range of the response variable, e.g. times when the response is a survival object, at which to return the survival probabilities. |
... |
Additional arguments that are passed on to the current method. |
Stolen from here
seq_range(x, n, by, trim = NULL, expand = NULL, pretty = FALSE)
seq_range(x, n, by, trim = NULL, expand = NULL, pretty = FALSE)
x |
A numeric vector |
n , by
|
Specify the output sequence either by supplying the
length of the sequence with I recommend that you name these arguments in order to make it clear to the reader. |
trim |
Optionally, trim values off the tails.
|
expand |
Optionally, expand the range by |
pretty |
If |
x <- rcauchy(100) seq_range(x, n = 10) seq_range(x, n = 10, trim = 0.1) seq_range(x, by = 1, trim = 0.1) # Make pretty sequences y <- runif (100) seq_range(y, n = 10) seq_range(y, n = 10, pretty = TRUE) seq_range(y, n = 10, expand = 0.5, pretty = TRUE) seq_range(y, by = 0.1) seq_range(y, by = 0.1, pretty = TRUE)
x <- rcauchy(100) seq_range(x, n = 10) seq_range(x, n = 10, trim = 0.1) seq_range(x, by = 1, trim = 0.1) # Make pretty sequences y <- runif (100) seq_range(y, n = 10) seq_range(y, n = 10, pretty = TRUE) seq_range(y, n = 10, expand = 0.5, pretty = TRUE) seq_range(y, by = 0.1) seq_range(y, by = 0.1, pretty = TRUE)
Simulate survival times from the piece-wise exponential distribution
sim_pexp(formula, data, cut)
sim_pexp(formula, data, cut)
formula |
An extended formula that specifies the linear predictor.
If you want to include a smooth baseline
or time-varying effects, use |
data |
A data set with variables specified in |
cut |
A sequence of time-points starting with 0. |
library(survival) library(dplyr) library(pammtools) # set number of observations/subjects n <- 250 # create data set with variables which will affect the hazard rate. df <- cbind.data.frame(x1 = runif (n, -3, 3), x2 = runif (n, 0, 6)) %>% as_tibble() # the formula which specifies how covariates affet the hazard rate f0 <- function(t) { dgamma(t, 8, 2) *6 } form <- ~ -3.5 + f0(t) -0.5*x1 + sqrt(x2) set.seed(24032018) sim_df <- sim_pexp(form, df, 1:10) head(sim_df) plot(survfit(Surv(time, status)~1, data = sim_df )) # for control, estimate with Cox PH mod <- coxph(Surv(time, status) ~ x1 + pspline(x2), data=sim_df) coef(mod)[1] layout(matrix(1:2, nrow=1)) termplot(mod, se = TRUE) # and using PAMs layout(1) ped <- sim_df %>% as_ped(Surv(time, status)~., max_time=10) library(mgcv) pam <- gam(ped_status ~ s(tend) + x1 + s(x2), data=ped, family=poisson, offset=offset) coef(pam)[2] plot(pam, page=1) ## Not run: # Example 2: Functional covariates/cumulative coefficients # function to generate one exposure profile, tz is a vector of time points # at which TDC z was observed rng_z = function(nz) { as.numeric(arima.sim(n = nz, list(ar = c(.8, -.6)))) } # two different exposure times for two different exposures tz1 <- 1:10 tz2 <- -5:5 # generate exposures and add to data set df <- df %>% add_tdc(tz1, rng_z) %>% add_tdc(tz2, rng_z) df # define tri-variate function of time, exposure time and exposure z ft <- function(t, tmax) { -1*cos(t/tmax*pi) } fdnorm <- function(x) (dnorm(x,1.5,2)+1.5*dnorm(x,7.5,1)) wpeak2 <- function(lag) 15*dnorm(lag,8,10) wdnorm <- function(lag) 5*(dnorm(lag,4,6)+dnorm(lag,25,4)) f_xyz1 <- function(t, tz, z) { ft(t, tmax=10) * 0.8*fdnorm(z)* wpeak2(t - tz) } f_xyz2 <- function(t, tz, z) { wdnorm(t-tz) * z } # define lag-lead window function ll_fun <- function(t, tz) {t >= tz} ll_fun2 <- function(t, tz) {t - 2 >= tz} # simulate data with cumulative effect sim_df <- sim_pexp( formula = ~ -3.5 + f0(t) -0.5*x1 + sqrt(x2)| fcumu(t, tz1, z.tz1, f_xyz=f_xyz1, ll_fun=ll_fun) + fcumu(t, tz2, z.tz2, f_xyz=f_xyz2, ll_fun=ll_fun2), data = df, cut = 0:10) ## End(Not run)
library(survival) library(dplyr) library(pammtools) # set number of observations/subjects n <- 250 # create data set with variables which will affect the hazard rate. df <- cbind.data.frame(x1 = runif (n, -3, 3), x2 = runif (n, 0, 6)) %>% as_tibble() # the formula which specifies how covariates affet the hazard rate f0 <- function(t) { dgamma(t, 8, 2) *6 } form <- ~ -3.5 + f0(t) -0.5*x1 + sqrt(x2) set.seed(24032018) sim_df <- sim_pexp(form, df, 1:10) head(sim_df) plot(survfit(Surv(time, status)~1, data = sim_df )) # for control, estimate with Cox PH mod <- coxph(Surv(time, status) ~ x1 + pspline(x2), data=sim_df) coef(mod)[1] layout(matrix(1:2, nrow=1)) termplot(mod, se = TRUE) # and using PAMs layout(1) ped <- sim_df %>% as_ped(Surv(time, status)~., max_time=10) library(mgcv) pam <- gam(ped_status ~ s(tend) + x1 + s(x2), data=ped, family=poisson, offset=offset) coef(pam)[2] plot(pam, page=1) ## Not run: # Example 2: Functional covariates/cumulative coefficients # function to generate one exposure profile, tz is a vector of time points # at which TDC z was observed rng_z = function(nz) { as.numeric(arima.sim(n = nz, list(ar = c(.8, -.6)))) } # two different exposure times for two different exposures tz1 <- 1:10 tz2 <- -5:5 # generate exposures and add to data set df <- df %>% add_tdc(tz1, rng_z) %>% add_tdc(tz2, rng_z) df # define tri-variate function of time, exposure time and exposure z ft <- function(t, tmax) { -1*cos(t/tmax*pi) } fdnorm <- function(x) (dnorm(x,1.5,2)+1.5*dnorm(x,7.5,1)) wpeak2 <- function(lag) 15*dnorm(lag,8,10) wdnorm <- function(lag) 5*(dnorm(lag,4,6)+dnorm(lag,25,4)) f_xyz1 <- function(t, tz, z) { ft(t, tmax=10) * 0.8*fdnorm(z)* wpeak2(t - tz) } f_xyz2 <- function(t, tz, z) { wdnorm(t-tz) * z } # define lag-lead window function ll_fun <- function(t, tz) {t >= tz} ll_fun2 <- function(t, tz) {t - 2 >= tz} # simulate data with cumulative effect sim_df <- sim_pexp( formula = ~ -3.5 + f0(t) -0.5*x1 + sqrt(x2)| fcumu(t, tz1, z.tz1, f_xyz=f_xyz1, ll_fun=ll_fun) + fcumu(t, tz2, z.tz2, f_xyz=f_xyz2, ll_fun=ll_fun2), data = df, cut = 0:10) ## End(Not run)
This is data simulated using the sim_pexp
function.
It contains two time-constant and two time-dependent covariates (observed
on different exposure time grids). The code used for simulation is
contained in the examples of ?sim_pexp
.
simdf_elra
simdf_elra
An object of class nested_fdf
(inherits from sim_df
, tbl_df
, tbl
, data.frame
) with 250 rows and 9 columns.
This dataset originates from the Drakenstein child health study. The data contains the following variables:
Randomly generated unique child ID
The time at which the child enters the risk set for the $k$-th event
Time of $k$-th infection or censoring
.
Event number. Maximum of 6.
staph
staph
An object of class tbl_df
(inherits from tbl
, data.frame
) with 374 rows and 6 columns.
Extract random effects in tidy data format.
tidy_re(x, keep = c("fit", "main", "xlab", "ylab"), ...)
tidy_re(x, keep = c("fit", "main", "xlab", "ylab"), ...)
x |
a fitted |
keep |
A vector of variables to keep. |
... |
Further arguments passed to |
Extract 1d smooth objects in tidy data format.
tidy_smooth(x, keep = c("x", "fit", "se", "xlab", "ylab"), ci = TRUE, ...)
tidy_smooth(x, keep = c("x", "fit", "se", "xlab", "ylab"), ci = TRUE, ...)
x |
a fitted |
keep |
A vector of variables to keep. |
ci |
A logical value indicating whether confidence intervals should be
calculated and returned. Defaults to |
... |
Further arguments passed to |
Extract 2d smooth objects in tidy format.
tidy_smooth2d( x, keep = c("x", "y", "fit", "se", "xlab", "ylab", "main"), ci = FALSE, ... )
tidy_smooth2d( x, keep = c("x", "y", "fit", "se", "xlab", "ylab", "main"), ci = FALSE, ... )
x |
a fitted |
keep |
A vector of variables to keep. |
ci |
A logical value indicating whether confidence intervals should be
calculated and returned. Defaults to |
... |
Further arguments passed to |
Information on patients treated for a cancer disease located in the stomach area. The data set includes:
Time from operation until death in days.
Event indicator (0 = censored, 1 = death).
The subject's age.
The subject's sex (male/female).
Charlson comorbidity score, 1-6.
Has subject received transfusions (no/yes).
Did major complications occur during operation (no/yes).
Did the tumor develop metastases? (no/yes).
Was the operation accompanied by a major resection (no/yes).
tumor
tumor
An object of class tbl_df
(inherits from tbl
, data.frame
) with 776 rows and 9 columns.