30  Survival Analysis


Survival analysis with {mlr3proba} is currently in a fragile state after its removal from CRAN. Hence most code examples listed in this page will not work for the time being. We are actively working on a solution!

Survival analysis is a sub-field of supervised machine learning in which the aim is to predict the survival distribution of a given individual. Arguably the main feature of survival analysis is that unlike classification and regression, learners are trained on two features:

  1. the time until the event takes place
  2. the event type: either censoring or death.

At a particular time-point, an individual is either: alive, dead, or censored. Censoring occurs if it is unknown if an individual is alive or dead. For example, say we are interested in patients in hospital and every day it is recorded if they are alive or dead, then after a patient leaves it is unknown if they are alive or dead, hence they are censored. If there was no censoring, then ordinary regression analysis could be used instead. Furthermore, survival data contains solely positive values and therefore needs to be transformed to avoid biases.

Note that survival analysis accounts for both censored and uncensored observations while adjusting respective model parameters.

The package mlr3proba (Sonabend et al. 2021) extends mlr3 with the following objects for survival analysis:

For a good introduction to survival analysis see Modelling Survival Data in Medical Research (Collett 2014).

30.1 TaskSurv

Unlike TaskClassif and TaskRegr which have a single ‘target’ argument, TaskSurv mimics the survival::Surv object and has three to four target arguments (dependent on censoring type). A TaskSurv can be constructed with the function as_task_surv():


as_task_surv(survival::bladder2[, -1L], id = "interval_censored",
  time = "start", event = "event", time2 = "stop", type = "interval")

# type = "right" is default
task = as_task_surv(survival::rats, id = "right_censored",
  time = "time", event = "status", type = "right")


# the target column is a survival object:

# kaplan-meier plot
# library("mlr3viz")

30.2 Predict Types - crank, lp, and distr

Every PredictionSurv object can predict one or more of:

  • lp - Linear predictor calculated as the fitted coefficients multiplied by the test data.
  • distr - Predicted survival distribution, either discrete or continuous. Implemented in distr6.
  • crank - Continuous risk ranking.
  • response - Predicted survival time.

lp and crank can be used with measures of discrimination such as the concordance index. Whilst lp is a specific mathematical prediction, crank is any continuous ranking that identifies who is more or less likely to experience the event. So far the only implemented learner that only returns a continuous ranking is surv.svm. If a PredictionSurv returns an lp then the crank is identical to this. Otherwise crank is calculated as the expectation of the predicted survival distribution. Note that for linear proportional hazards models, the ranking (but not necessarily the crank score itself) given by lp and the expectation of distr, is identical.

The example below uses the rats task shipped with mlr3proba.

task = tsk("rats")
learn = lrn("surv.coxph")

train_set = sample(task$nrow, 0.8 * task$nrow)
test_set = setdiff(seq_len(task$nrow), train_set)

learn$train(task, row_ids = train_set)
prediction = learn$predict(task, row_ids = test_set)

<PredictionSurv> for 60 observations:
    row_ids time status      crank         lp     distr
          3  104  FALSE -0.4069555 -0.4069555 <list[1]>
          4   91  FALSE -2.7557699 -2.7557699 <list[1]>
          6  102  FALSE -3.3385361 -3.3385361 <list[1]>
        291  104  FALSE  0.3716739  0.3716739 <list[1]>
        296  104  FALSE  0.3878953  0.3878953 <list[1]>
        299  104  FALSE -2.5436853 -2.5436853 <list[1]>

30.3 Composition

Finally we take a look at the PipeOps implemented in mlr3proba, which are used for composition of predict types. For example, a predict linear predictor does not have a lot of meaning by itself, but it can be composed into a survival distribution. See mlr3pipelines for full tutorials and details on PipeOps.

# PipeOpDistrCompositor - Train one model with a baseline distribution,
# (Kaplan-Meier or Nelson-Aalen), and another with a predicted linear predictor.
task = tsk("rats")
# remove the factor column for support with glmnet
task$select(c("litter", "rx"))
learner_lp = lrn("surv.glmnet")
learner_distr = lrn("surv.kaplan")
prediction_lp = learner_lp$train(task)$predict(task)
prediction_distr = learner_distr$train(task)$predict(task)

# Doesn't need training. Base = baseline distribution. ph = Proportional hazards.

pod = po("compose_distr", form = "ph", overwrite = FALSE)
prediction = pod$predict(list(base = prediction_distr, pred = prediction_lp))$output

# Now we have a predicted distr!


# This can all be simplified by using the distrcompose pipeline

glm.distr = ppl("distrcompositor", learner = lrn("surv.glmnet"),
  estimator = "kaplan", form = "ph", overwrite = FALSE, graph_learner = TRUE)

30.4 Benchmark Experiment

Finally, we conduct a small benchmark study on the rats task using some of the integrated survival learners:


task = tsk("rats")

# some integrated learners
learners = lrns(c("surv.coxph", "surv.kaplan", "surv.ranger"))
<LearnerSurvCoxPH:surv.coxph>: Cox Proportional Hazards
* Model: -
* Parameters: list()
* Packages: mlr3, mlr3proba, survival, distr6
* Predict Type: distr
* Feature types: logical, integer, numeric, factor
* Properties: weights

<LearnerSurvKaplan:surv.kaplan>: Kaplan-Meier Estimator
* Model: -
* Parameters: list()
* Packages: mlr3, mlr3proba, survival, distr6
* Predict Type: crank
* Feature types: logical, integer, numeric, character, factor, ordered
* Properties: missings

<LearnerSurvRanger:surv.ranger>: Random Forest
* Model: -
* Parameters: num.threads=1
* Packages: mlr3, mlr3proba, mlr3extralearners, ranger
* Predict Type: distr
* Feature types: logical, integer, numeric, character, factor, ordered
* Properties: importance, oob_error, weights
# Harrell's C-Index for survival
measure = msr("surv.cindex")
* Packages: mlr3, mlr3proba
* Range: [0, 1]
* Minimize: FALSE
* Average: macro
* Parameters: weight_meth=I, tiex=0.5
* Properties: -
* Predict type: crank
* Return type: Score
bmr = benchmark(benchmark_grid(task, learners, rsmp("cv", folds = 3)))
INFO  [21:38:53.261] [mlr3] Running benchmark with 9 resampling iterations 
INFO  [21:38:53.359] [mlr3] Applying learner 'surv.ranger' on task 'rats' (iter 3/3) 
INFO  [21:38:53.610] [mlr3] Applying learner 'surv.coxph' on task 'rats' (iter 3/3) 
INFO  [21:38:53.634] [mlr3] Applying learner 'surv.kaplan' on task 'rats' (iter 3/3) 
INFO  [21:38:53.653] [mlr3] Applying learner 'surv.ranger' on task 'rats' (iter 2/3) 
INFO  [21:38:53.882] [mlr3] Applying learner 'surv.ranger' on task 'rats' (iter 1/3) 
INFO  [21:38:54.125] [mlr3] Applying learner 'surv.kaplan' on task 'rats' (iter 2/3) 
INFO  [21:38:54.143] [mlr3] Applying learner 'surv.coxph' on task 'rats' (iter 1/3) 
INFO  [21:38:54.164] [mlr3] Applying learner 'surv.coxph' on task 'rats' (iter 2/3) 
INFO  [21:38:54.186] [mlr3] Applying learner 'surv.kaplan' on task 'rats' (iter 1/3) 
INFO  [21:38:54.208] [mlr3] Finished benchmark 
   nr      resample_result task_id  learner_id resampling_id iters surv.cindex
1:  1 <ResampleResult[22]>    rats  surv.coxph            cv     3   0.7671307
2:  2 <ResampleResult[22]>    rats surv.kaplan            cv     3   0.5000000
3:  3 <ResampleResult[22]>    rats surv.ranger            cv     3   0.7799153
autoplot(bmr, measure = measure)

The experiment indicates that both the Cox PH and the random forest have better discrimination than the Kaplan-Meier baseline estimator, but that the machine learning random forest is not consistently better than the interpretable Cox PH.