7.1 Survival Analysis

Survival analysis examines data on whether a specific event of interest takes place and how long it takes till this event occurs. One cannot use ordinary regression analysis when dealing with survival analysis data sets. Firstly, survival data contains solely positive values and therefore needs to be transformed to avoid biases. Secondly, ordinary regression analysis cannot deal with censored observations accordingly. Censored observations are observations in which the event of interest has not occurred, yet. Survival analysis allows the user to handle censored data with limited time frames that sometimes do not entail the event of interest. Note that survival analysis accounts for both censored and uncensored observations while adjusting respective model parameters.

The package mlr3proba extends mlr3 with the following objects for survival analysis:

In this example we demonstrate the basic functionality of the package on the rats data from the survival package. This task ships as pre-defined TaskSurv with mlr3proba.

library(mlr3proba)
task = tsk("rats")
print(task)
## <TaskSurv:rats> (300 x 5)
## * Target: time, status
## * Properties: -
## * Features (3):
##   - int (2): litter, rx
##   - fct (1): sex
mlr3viz::autoplot(task)


# the target column is a survival object:
head(task$truth())
## [1] 101+  49  104+  91+ 104+ 102+

Now, we conduct a small benchmark study on the rats task using some of the integrated survival learners:

# integrated learners
learners = lapply(c("surv.coxph", "surv.kaplan", "surv.ranger"), lrn)
print(learners)
## [[1]]
## <LearnerSurvCoxPH:surv.coxph>
## * Model: -
## * Parameters: list()
## * Packages: survival, distr6
## * Predict Type: distr
## * Feature types: logical, integer, numeric, factor
## * Properties: importance
## 
## [[2]]
## <LearnerSurvKaplan:surv.kaplan>
## * Model: -
## * Parameters: list()
## * Packages: survival, distr6
## * Predict Type: crank
## * Feature types: logical, integer, numeric, character, factor, ordered
## * Properties: missings
## 
## [[3]]
## <LearnerSurvRanger:surv.ranger>
## * Model: -
## * Parameters: list()
## * Packages: ranger, distr6
## * Predict Type: distr
## * Feature types: logical, integer, numeric, character, factor, ordered
## * Properties: importance, oob_error, weights

measure = msr("surv.harrellC")
print(measure)
## <MeasureSurvHarrellC:surv.harrellC>
## * Packages: -
## * Range: [0, 1]
## * Minimize: FALSE
## * Properties: -
## * Predict type: crank

set.seed(1)
bmr = benchmark(benchmark_grid(task, learners, rsmp("cv", folds = 3)))
print(bmr)
## <BenchmarkResult> of 9 rows with 3 resampling runs
##  nr task_id  learner_id resampling_id iters warnings errors
##   1    rats  surv.coxph            cv     3        0      0
##   2    rats surv.kaplan            cv     3        0      0
##   3    rats surv.ranger            cv     3        0      0

mlr3viz::autoplot(bmr, measure = measure)