## 7.1 Survival Analysis

Survival analysis examines data on whether a specific event of interest takes place and how long it takes till this event occurs. One cannot use ordinary regression analysis when dealing with survival analysis data sets. Firstly, survival data contains solely positive values and therefore needs to be transformed to avoid biases. Secondly, ordinary regression analysis cannot deal with censored observations accordingly. Censored observations are observations in which the event of interest has not occurred, yet. Survival analysis allows the user to handle censored data with limited time frames that sometimes do not entail the event of interest. Note that survival analysis accounts for both censored and uncensored observations while adjusting respective model parameters.

The package mlr3proba extends mlr3 with the following objects for survival analysis:

In this example we demonstrate the basic functionality of the package on the rats data from the survival package. This task ships as pre-defined TaskSurv with mlr3proba.

library("mlr3proba")
print(task)
## <TaskSurv:rats> (300 x 5)
## * Target: time, status
## * Properties: -
## * Features (3):
##   - int (2): litter, rx
##   - fct (1): sex
# the target column is a survival object:
head(task$truth()) ## [1] 101+ 49 104+ 91+ 104+ 102+ # kaplan-meier plot library("mlr3viz") autoplot(task) ## Registered S3 method overwritten by 'GGally': ## method from ## +.gg ggplot2 Now, we conduct a small benchmark study on the rats task using some of the integrated survival learners: # some integrated learners learners = lapply(c("surv.coxph", "surv.kaplan", "surv.ranger"), lrn) print(learners) ## [[1]] ## <LearnerSurvCoxPH:surv.coxph> ## * Model: - ## * Parameters: list() ## * Packages: survival, distr6 ## * Predict Type: distr ## * Feature types: logical, integer, numeric, factor ## * Properties: importance ## ## [[2]] ## <LearnerSurvKaplan:surv.kaplan> ## * Model: - ## * Parameters: list() ## * Packages: survival, distr6 ## * Predict Type: crank ## * Feature types: logical, integer, numeric, character, factor, ordered ## * Properties: missings ## ## [[3]] ## <LearnerSurvRanger:surv.ranger> ## * Model: - ## * Parameters: list() ## * Packages: ranger, distr6 ## * Predict Type: distr ## * Feature types: logical, integer, numeric, character, factor, ordered ## * Properties: importance, oob_error, weights # Uno's C-Index for survival measure = msr("surv.unoC") print(measure) ## <MeasureSurvUnoC:surv.unoC> ## * Packages: survAUC ## * Range: [0, 1] ## * Minimize: FALSE ## * Properties: na_score, requires_task, requires_train_set ## * Predict type: crank set.seed(1) bmr = benchmark(benchmark_grid(task, learners, rsmp("cv", folds = 3))) bmr$aggregate(measure)
##    nr  resample_result task_id  learner_id resampling_id iters surv.unoC
## 1:  1 <ResampleResult>    rats  surv.coxph            cv     3    0.9037
## 2:  2 <ResampleResult>    rats surv.kaplan            cv     3    0.0000
## 3:  3 <ResampleResult>    rats surv.ranger            cv     3    0.8640
autoplot(bmr, measure = measure)