7 Special Tasks
This chapter explores the different functions of
mlr3 when dealing with specific data sets that require further statistical modification to undertake sensible analysis.
Following topics are discussed:
This sub-chapter explains how to conduct sound survival analysis in mlr3. Survival analysis is used to monitor the period of time until a specific event takes places. This specific event could be e.g. death, transmission of a disease, marriage or divorce. Two considerations are important when conducting survival analysis:
- Whether the event occurred within the frame of the given data
- How much time it took until the event occurred
In summary, this sub-chapter explains how to account for these considerations and conduct survival analysis using the
mlr3proba extension package.
This sub-chapter explains how to conduct (unconditional) density estimation in mlr3. Density estimation is used to estimate the probability density function of a continuous variable. Unconditional density estimation is an unsupervised task so there is no ‘value’ to predict, instead densities are estimated.
This sub-chapter explains how to estimate probability distributions for continuous variables using the
mlr3proba extension package.
Spatial analysis data observations entail reference information about spatial characteristics. One of the largest shortcomings of spatial data analysis is the inevitable auto-correlation in spatial data. Auto-correlation is especially severe in data with marginal spatial variation. The sub-chapter on spatial analysis provides instructions on how to handle the problems associated with spatial data accordingly.
This is work in progress. See mlr3ordinal for the current state.
Functional analysis contains data that consists of curves varying over a continuum e.g. time, frequency or wavelength. This type of analysis is frequently used when examining measurements over various time points. Steps on how to accommodate functional data structures in mlr3 are explained in the functional analysis sub-chapter.
Multilabel classification deals with objects that can belong to more than one category at the same time. Numerous target labels are attributed to a single observation. Working with multilabel data requires one to use modified algorithms, to accommodate data specific characteristics. Two approaches to multilabel classification are prominently used:
- The problem transformation method
- The algorithm adaption method
Instructions on how to deal with multilabel classification in mlr3 can be found in this sub-chapter.
Cost Sensitive Classification
This sub-chapter deals with the implementation of cost-sensitive classification. Regular classification aims to minimize the misclassification rate and thus all types of misclassification errors are deemed equally severe. Cost-sensitive classification is a setting where the costs caused by different kinds of errors are not assumed to be equal. The objective is to minimize the expected costs.
Analytical data for a big credit institution is used as a use case to illustrate the different features. Firstly, the sub-chapter provides guidance on how to implement a first model. Subsequently, the sub-chapter contains instructions on how to modify cost sensitivity measures, thresholding and threshold tuning.