5 Feature Selection
Feature selection, also known as variable or descriptor selection, is the process of finding a subset of features to use with a given task and learner. Using an optimal set of features can have several benefits:
- improved predictive performance, since we reduce overfitting on irrelevant features,
- robust models that do not rely on noisy features,
- simpler models that are easier to interpret,
- faster model fitting, e.g. for model updates,
- faster prediction, and
- no need to collect potentially expensive features.
However, these objectives will not necessarily be optimized by the same optimal set of features and thus feature selection is inherently multi-objective. In this chapter, we mostly focus on feature selection as a means of improving predictive performance, but also briefly cover optimization of multiple criteria (Section 5.2.5).
Reducing the amount of features can improve models across many scenarios, but it can be especially helpful in datasets that have a high number of features in comparison to the number of datapoints. Many learners perform implicit, also called embedded, feature selection, e.g. via the choice of variables used for splitting in a decision tree. Most other feature selection methods are model agnostic, i.e. they can be used together with any learner. Of the many different approaches to identifying relevant features, we will focus on two general concepts, which are described in detail below: Filter and Wrapper methods (Guyon and Elisseeff 2003; Chandrashekar and Sahin 2014).
For this chapter, the reader should know the basic concepts of mlr3 (Chapter 2), i.e. know about tasks (Section 2.1) and learners (Section 2.2). Basics about performance evaluation (Chapter 3), i.e. resampling (Section 3.2) and benchmarking (Section 3.3) are helpful but not strictly necessary.
5.1 Filters
Filter methods are preprocessing steps that can be applied before training a model. A very simple filter approach could look like this:
- calculate the correlation coefficient \(\rho\) between each feature and a numeric target variable, and
- select all features with \(\rho > 0.2\) for further modelling steps.
This approach is a univariate filter because it only considers the univariate relationship between each feature and the target variable. Further, it can only be applied to regression tasks with continuous features and the threshold of \(\rho > 0.2\) is quite arbitrary. Thus, more advanced filter methods, e.g. multivariate filters based on feature importance, usually perform better (Bommert et al. 2020). On the other hand, a benefit of univariate filters is that they are usually computationally cheaper than more complex filter or wrapper methods. In the following, it is described how to calculate univariate, multivariate and feature importance filters, how to access implicitly selected features, how to integrate filters in a machine learning pipeline and how to optimize filter thresholds.
Filter algorithms select features by assigning numeric scores to each feature, e.g. correlation between feature and target variables, use these to rank the features and select a feature subset based on the ranking. Features that are assigned lower scores can then be omitted in subsequent modeling steps. All filters are implemented via the package mlr3filters
. Below, we cover how to
- instantiate a
Filter
object, - calculate scores for a given task, and
- use calculated scores to select or drop features.
One special case of filters are feature importance filters (Section 5.1.2). They select features that are important according to the model induced by a selected Learner
. Feature importance filters rely on the learner to extract information on feature importance from a trained model, for example, by inspecting a learned decision tree and returning the features that are used as split variables, or by computing model-agnostic feature importance (Chapter 10) values for each feature.
Many filter methods are implemented in mlr3filters
, for example:
- Correlation, calculating Pearson or Spearman correlation between numeric features and numeric targets (
FilterCorrelation
) - Information gain, i.e. mutual information of the feature and the target or the reduction of uncertainty of the target due to a feature (
FilterInformationGain
) - Minimal joint mutual information maximization, minimizing the joint information between selected features to avoid redundancy (
FilterJMIM
) - Permutation score, which calculates permutation feature importance (see Chapter 10) with a given learner for each feature (
FilterPermutation
) - Area under the ROC curve calculated for each feature separately (
FilterAUC
)
Most of the filter methods have some limitations, e.g. the correlation filter can only be calculated for regression tasks with numeric features. For a full list of all implemented filter methods we refer the reader to the mlr3filters website1, which also shows the supported task and features types. A benchmark of filter methods was performed by Bommert et al. (2020), who recommend to not rely on a single filter method but try several ones if the available computational resources allow. If only a single filter method is to be used, the authors recommend to use a feature importance filter using random forest permutation importance (see Section 5.1.2), similar to the permutation method described above, but also the JMIM and AUC filters performed well in their comparison.
5.1.1 Calculating Filter Values
The first step is to create a new R object using the class of the desired filter method. Similar to other instances in mlr3
, these are registered in a dictionary (mlr_filters
) with an associated shortcut function flt()
. Each object of class Filter
has a $calculate()
method which computes the filter values and ranks them in a descending order. For example, we can use the information gain filter described above:
Such a Filter
object can now be used to calculate the filter on the penguins data and get the results:
feature score
1: flipper_length 0.581167901
2: bill_length 0.544896584
3: bill_depth 0.538718879
4: island 0.520157171
5: body_mass 0.442879511
6: sex 0.007244168
7: year 0.000000000
Some filters have hyperparameters, which can be changed similar to setting hyperparameters of a Learner
using $param_set$values
. For example, to calculate "spearman"
instead of "pearson"
correlation with the correlation filter:
5.1.2 Feature Importance Filters
To use feature importance filters, we can use a learner with integrated feature importance methods. All learners with the property “importance” have this functionality. A list of all learners with this property can be found with
key label task_type
1: classif.catboost Gradient Boosting classif
2: classif.featureless Featureless Classification Learner classif
3: classif.gbm Gradient Boosting classif
4: classif.imbalanced_rfsrc Imbalanced Random Forest classif
5: classif.lightgbm Gradient Boosting classif
---
22: surv.gbm Gradient Boosting surv
23: surv.mboost Boosted Generalized Additive Model surv
24: surv.ranger Random Forest surv
25: surv.rfsrc Random Forest surv
26: surv.xgboost Gradient Boosting surv
4 variables not shown: [feature_types, packages, properties, predict_types]
or on the mlr3 website2.
For some learners, the desired filter method needs to be set during learner creation. For example, learner classif.ranger
comes with multiple integrated methods, c.f. the help page of ranger::ranger()
. To use the feature importance method “impurity”, select it during learner construction:
Now we can use the FilterImportance
filter class:
task = tsk("penguins")
# Remove observations with missing data
task$filter(which(complete.cases(task$data())))
filter = flt("importance", learner = lrn)
filter$calculate(task)
as.data.table(filter)
feature score
1: bill_length 76.374739
2: flipper_length 45.348924
3: bill_depth 36.305939
4: body_mass 26.457564
5: island 24.077990
6: sex 1.597289
7: year 1.215536
5.1.3 Embedded Methods
Many learners internally select a subset of the features which they find helpful for prediction, but ignore other features. For example, a decision tree might never select some features for splitting. These subsets can be used for feature selection, which we call embedded methods because the feature selection is embedded in the learner. The selected features (and those not selected) can be queried if the learner has the "selected_features"
property, as the following example demonstrates:
task = tsk("penguins")
learner = lrn("classif.rpart")
# ensure that the learner selects features
stopifnot("selected_features" %in% learner$properties)
learner = learner$train(task)
learner$selected_features()
[1] "flipper_length" "bill_length" "island"
The features selected by the model can be extracted by a Filter
object, where $calculate()
corresponds to training the learner on the given task:
feature score
1: island 1
2: flipper_length 1
3: bill_length 1
4: bill_depth 0
5: sex 0
6: year 0
7: body_mass 0
Contrary to other filter methods, embedded methods just return value of 1 (selected features) and 0 (dropped feature).
5.1.4 Filter-based Feature Selection
After calculating a score for each feature, one has to select the features to be kept or those to be dropped from further modelling steps. For the "selected_features"
filter described in embedded methods (Section 5.1.3), this step is straight-forward since the methods assigns either a value of 1 for a feature to be kept or 0 for a feature to be dropped. With task$select()
the features with a value of 1 can be selected:
task = tsk("penguins")
learner = lrn("classif.rpart")
filter = flt("selected_features", learner = learner)
filter$calculate(task)
# select all features used by rpart
keep = names(which(filter$scores == 1))
task$select(keep)
task$feature_names
[1] "bill_length" "flipper_length" "island"
To select features, we use the function task$select()
and not task$filter()
, which is used to filter rows (not columns) of the data matrix, see task mutators (Section 2.1.3).
For filter methods which assign continuous scores, there are essentially two ways to select features:
- select the top \(k\) features, or
- select all features with a score above a threshold \(\tau\).
Where the first option is equivalent to dropping the bottom \(p-k\) features. For both options, one has to decide on a threshold, which is often quite arbitrary. For example, to implement the first option with the information gain filter:
task = tsk("penguins")
filter = flt("information_gain")
filter$calculate(task)
# select top 3 features from information gain filter
keep = names(head(filter$scores, 3))
task$select(keep)
task$feature_names
[1] "bill_depth" "bill_length" "flipper_length"
Or, the second option with \(\tau = 0.5\):
task = tsk("penguins")
filter = flt("information_gain")
filter$calculate(task)
# select all features with score >0.5 from information gain filter
keep = names(which(filter$scores > 0.5))
task$select(keep)
task$feature_names
[1] "bill_depth" "bill_length" "flipper_length" "island"
Filters can be integrated into pipelines. While pipelines are described in detail in Chapter 6, here is a brief preview:
library(mlr3pipelines)
task = tsk("penguins")
# combine filter (keep top 3 features) with learner
graph = po("filter", filter = flt("information_gain"), filter.nfeat = 3) %>>%
po("learner", lrn("classif.rpart"))
# now it can be used as any learner, but it includes the feature selection
learner = as_learner(graph)
learner$train(task)
Pipelines can also be used to apply hyperparameter optimization (Chapter 4) to the filter, i.e. tune the filter threshold to optimize the feature selection regarding prediction performance. We first combine a filter with a learner,
and tune how many feature to include
library("mlr3tuning")
ps = ps(information_gain.filter.nfeat = p_int(lower = 1, upper = 7))
instance = TuningInstanceSingleCrit$new(
task = task,
learner = learner,
resampling = rsmp("holdout"),
measure = msr("classif.acc"),
search_space = ps,
terminator = trm("none")
)
tuner = tnr("grid_search")
tuner$optimize(instance)
information_gain.filter.nfeat learner_param_vals x_domain classif.acc
1: 5 <list[2]> <list[1]> 0.9391304
The output above shows only the best result. To show the results of all tuning steps, retrieve them from the archive of the tuning instance:
information_gain.filter.nfeat classif.acc
1: 2 0.9304348
2: 5 0.9391304
3: 1 0.7478261
4: 7 0.9391304
5: 3 0.9391304
6: 6 0.9391304
7: 4 0.9391304
7 variables not shown: [x_domain_information_gain.filter.nfeat, runtime_learners, timestamp, batch_nr, warnings, errors, resample_result]
We can also plot the tuning results:
For more details, see Pipelines (Chapter 6) and Hyperparameter Optimization (Chapter 4).
5.2 Wrapper Methods
Wrapper methods work by fitting models on selected feature subsets and evaluating their performance. This can be done in a sequential fashion, e.g. by iteratively adding features to the model in the so-called sequential forward selection, or in a parallel fashion, e.g. by evaluating random feature subsets in a random search. Below, the use of these simple approaches is described in a common framework along with more advanced methods such as genetic search. It is further shown how to select features by optimizing multiple performance measures and how to wrap a learner with feature selection to use it in pipelines or benchmarks.
In more detail, wrapper methods iteratively select features that optimize a performance measure. Instead of ranking features, a model is fit on a selected subset of features in each iteration and evaluated in resampling with respect to a selected performance measure. The strategy that determines which feature subset is used in each iteration is given by the FSelector
object. A simple example is the sequential forward selection that starts with computing each single-feature model, selects the best one, and then iteratively adds the feature that leads to the largest performance improvement. Wrapper methods can be used with any learner but need to train the learner potentially many times, leading to a computationally intensive method. All wrapper methods are implemented via the package mlr3fselect
. In this chapter, we cover how to
- instantiate an
FSelector
object, - configure it, to e.g. respect a runtime limit or for different objectives,
- run it or fuse it with a
Learner
via anAutoFSelector
.
Wrapper-based feature selection is very similar to hyperparameter optimization (Chapter 4). The major difference is that we search for well-performing feature subsets instead of hyperparameter configurations. We will see below, that we can even use the same terminators, that some feature selection algorithms are similar to tuners and that we can also optimize multiple performance measures with feature selection.
5.2.1 Simple Forward Selection Example
We start with the simple example from above and do sequential forward selection with the penguins data:
library("mlr3fselect")
# subset features to ease visualization
task = tsk("penguins")
task$select(c("bill_depth", "bill_length", "body_mass", "flipper_length"))
instance = fselect(
fselector = fs("sequential"),
task = task,
learner = lrn("classif.rpart"),
resampling = rsmp("holdout"),
measure = msr("classif.acc")
)
To show all analyzed feature subsets and the corresponding performance, we use as.data.table(instance$archive)
.
bill_depth bill_length body_mass flipper_length classif.acc
1: TRUE FALSE FALSE FALSE 0.6956522
2: FALSE TRUE FALSE FALSE 0.7652174
3: FALSE FALSE TRUE FALSE 0.7043478
4: FALSE FALSE FALSE TRUE 0.7913043
We see that the feature flipper_length
achieved the highest prediction performance in the first iteration and is thus selected. In the second round, adding bill_length
improves performance to over 90%:
bill_depth bill_length body_mass flipper_length classif.acc
1: TRUE FALSE FALSE TRUE 0.7652174
2: FALSE TRUE FALSE TRUE 0.9391304
3: FALSE FALSE TRUE TRUE 0.8173913
However, adding a third feature does not improve performance
bill_depth bill_length body_mass flipper_length classif.acc
1: TRUE TRUE FALSE TRUE 0.9391304
2: FALSE TRUE TRUE TRUE 0.9391304
and the algorithm terminates. To directly show the best feature set, we can use:
instance$result_feature_set
shows features in alphabetical order and not in the order selected.
Internally, the fselect
function creates an FSelectInstanceSingleCrit
object and executes the feature selection with an FSelector
object, based on the selected method, in this example an FSelectorSequential
object. It uses the supplied resampling and measure to evaluate all feature subsets provided by the FSelector
on the task.
At the heart of mlr3fselect
are the R6 classes:
-
FSelectInstanceSingleCrit
,FSelectInstanceMultiCrit
: These two classes describe the feature selection problem and store the results. -
FSelector
: This class is the base class for implementations of feature selection algorithms.
In the following two sections, these classes will be created manually, to learn more about the mlr3fselect
package.
5.2.2 The FSelectInstance
Classes
To create an FSelectInstanceSingleCrit
object, we use the sugar function fsi
, which is short for FSelectInstanceSingleCrit$new()
or FSelectInstanceMultiCrit$new()
, depending on the selected measure(s):
Note that we have not selected a feature selection algorithm and thus did not select any features, yet. We have also supplied a so-called Terminator
, which is used to stop the feature selection. For the forward selection in the example above, we did not need a terminator because we simply tried all remaining features until the full model or no further performance improvement. However, for other feature selection algorithms such as random search
, a terminator is required. The following terminator are available:
- Terminate after a given time (
TerminatorClockTime
) - Terminate after a given amount of iterations (
TerminatorEvals
) - Terminate after a specific performance is reached (
TerminatorPerfReached
) - Terminate when feature selection does not improve (
TerminatorStagnation
) - A combination of the above in an ALL or ANY fashion (
TerminatorCombo
)
Above we used the sugar function trm
to select TerminatorEvals
with 20 evaluations.
To start the feature selection, we still need to select an algorithm which are defined via the FSelector
class, described in the next section.
5.2.3 The FSelector
Class
The FSelector
class is the base class for different feature selection algorithms. The following algorithms are currently implemented in mlr3fselect
:
- Random search, trying random feature subsets until termination (
FSelectorRandomSearch
) - Exhaustive search, trying all possible feature subsets (
FSelectorExhaustiveSearch
) - Sequential search, i.e. sequential forward or backward selection (
FSelectorSequential
) - Recursive feature elimination, which uses learner’s importance scores to iteratively remove features with low feature importance (
FSelectorRFE
) - Design points, trying all user-supplied feature sets (
FSelectorDesignPoints
) - Genetic search, implementing a genetic algorithm which treats the features as a binary sequence and tries to find the best subset with mutations (
FSelectorGeneticSearch
) - Shadow variable search, which adds permuted copies of all features (shadow variables) and stops when a shadow variable is selected (
FSelectorShadowVariableSearch
)
In this example, we will use a simple random search and retrieve it from the dictionary mlr_fselectors
with the fs()
sugar function, which is short for FSelectorRandomSearch$new()
:
5.2.4 Starting the Feature Selection
To start the feature selection, we pass the FSelectInstanceSingleCrit
object to the $optimize()
method of the initialized FSelector
object:
The algorithm proceeds as follows
- The
FSelector
proposes at least one feature subset and may propose multiple subsets to improve parallelization, which can be controlled via the settingbatch_size
. - For each feature subset, the given learner is fitted on the task using the provided resampling and evaluated with the given measure.
- All evaluations are stored in the archive of the
FSelectInstanceSingleCrit
object. - The terminator is queried if the budget is exhausted. If the budget is not exhausted, restart with 1) until it is.
- Determine the feature subset with the best observed performance.
- Store the best feature subset as the result in the instance object.
The best feature subset and the corresponding measured performance can be accessed from the instance:
features classif.acc
1: bill_depth,bill_length,body_mass,flipper_length,island,sex,... 0.9391304
As in the forward selection example above, one can investigate all resamplings which were undertaken, as they are stored in the archive of the FSelectInstanceSingleCrit
object and can be accessed by using as.data.table()
:
bill_depth bill_length body_mass classif.acc
1: TRUE TRUE TRUE 0.9391304
2: FALSE FALSE TRUE 0.7391304
3: TRUE TRUE TRUE 0.9391304
4: TRUE TRUE TRUE 0.9391304
5: TRUE TRUE TRUE 0.9391304
6: FALSE FALSE FALSE 0.6869565
7: TRUE FALSE TRUE 0.8086957
8: FALSE FALSE TRUE 0.7391304
9: TRUE FALSE FALSE 0.7739130
10: TRUE FALSE FALSE 0.8000000
11: FALSE FALSE FALSE 0.8086957
12: FALSE FALSE TRUE 0.6869565
13: TRUE TRUE TRUE 0.9391304
14: FALSE FALSE FALSE 0.6173913
15: TRUE TRUE TRUE 0.9217391
16: TRUE TRUE TRUE 0.9391304
17: TRUE TRUE TRUE 0.9043478
18: FALSE FALSE TRUE 0.7391304
19: FALSE FALSE FALSE 0.7739130
20: FALSE FALSE FALSE 0.8086957
Now the optimized feature subset can be used to subset the task and fit the model on all observations:
The trained model can now be used to make a prediction on external data.
Predicting on observations present in the task used for feature selection should be avoided. The model has seen these observations already during feature selection and therefore performance evaluation results would be over-optimistic. Instead, to get unbiased performance estimates for the current task, nested resampling (see Section 5.2.6 and Section 4.4) is required.
5.2.5 Optimizing Multiple Performance Measures
You might want to use multiple criteria to evaluate the performance of the feature subsets. For example, you might want to select the subset with the highest classification accuracy and lowest time to train the model. However, these two subsets will generally not coincide, i.e. the subset with highest classification accuracy will probably be another subset than that with lowest training time. With mlr3fselect
, the result is the pareto-optimal solution, i.e. the best feature subset for each of the criteria that is not dominated by another subset. For the example with classification accuracy and training time, a feature subset that is best in accuracy and training time will dominate all other subsets and thus will be the only pareto-optimal solution. If, however, different subsets are best in the two criteria, both subsets are pareto-optimal.
We will expand the previous example and perform feature selection on the penguins dataset, however, this time we will use FSelectInstanceMultiCrit
to select the subset of features that has the highest classification accuracy and the one with the lowest time to train the model.
The feature selection process with multiple criteria is similar to that with a single criterion, except that we select two measures to be optimized:
The function fsi
creates an instance of FSelectInstanceMultiCrit
if more than one measure is selected. We now create an FSelector
and call the $optimize()
function of the FSelector
with the FSelectInstanceMultiCrit
object, to search for the subset of features with the best classification accuracy and time to train the model. This time, we use design points
to manually specify two feature sets to try: one with only the feature sex
and one with all features except island
, sex
and year
. We expect the sex-only model to train fast and the model including many features to be accurate.
As above, the best feature subset and the corresponding measured performance can be accessed from the instance. However, in this simple case, if the fastest subset is not also the best performing subset, the result consists of two subsets: one with the lowest training time and one with the best classification accuracy:
features classif.acc time_train
1: sex 0.4347826 0.003
2: bill_depth,bill_length,body_mass,flipper_length 0.9304348 0.004
As explained above, the result is the pareto-optimal solution.
5.2.6 Automating the Feature Selection
The AutoFSelector
class wraps a learner and augments it with an automatic feature selection for a given task. Because the AutoFSelector
itself inherits from the Learner
base class, it can be used like any other learner. Below, a new learner is created. This learner is then wrapped in a random search feature selector, which automatically starts a feature selection on the given task using an inner resampling, as soon as the wrapped learner is trained. Here, the function auto_fselector
creates an instance of AutoFSelector
, i.e. it is short for AutoFSelector$new()
.
at = auto_fselector(
fselector = fs("random_search"),
learner = lrn("classif.log_reg"),
resampling = rsmp("holdout"),
measure = msr("classif.acc"),
terminator = trm("evals", n_evals = 10)
)
at
<AutoFSelector:classif.log_reg.fselector>
* Model: list
* Packages: mlr3, mlr3fselect, mlr3learners, stats
* Predict Type: response
* Feature Types: logical, integer, numeric, character, factor, ordered
* Properties: loglik, twoclass
We can now, as with any other learner, call the $train()
and $predict()
method. This time however, we pass it to benchmark()
to compare the optimized feature subset to the complete feature set. This way, the AutoFSelector
will do its resampling for feature selection on the training set of the respective split of the outer resampling. The learner then undertakes predictions using the test set of the outer resampling. Here, the outer resampling refers to the resampling specified in benchmark()
, whereas the inner resampling is that specified in auto_fselector()
. This is called nested resampling (Section 4.4) and yields unbiased performance measures, as the observations in the test set have not been used during feature selection or fitting of the respective learner.
In the call to benchmark()
, we compare our wrapped learner at
with a normal logistic regression lrn("classif.log_reg")
. For that, we create a benchmark grid with the task, the learners and a 3-fold cross validation on the sonar
data.
Now, we compare those two learners regarding classification accuracy and training time:
aggr = bmr$aggregate(msrs(c("classif.acc", "time_train")))
as.data.table(aggr)[, .(learner_id, classif.acc, time_train)]
learner_id classif.acc time_train
1: classif.log_reg.fselector 0.7351967 1.05400000
2: classif.log_reg 0.6585231 0.03566667
We can see that, in this example, the feature selection improves prediction performance but also drastically increases the training time, since the feature selection (including resampling and random search) is part of the model training of the wrapped learner.
5.3 Conclusion
In this chapter, we learned how to perform feature selection with mlr3. We introduced filter and wrapper methods, combined feature selection with pipelines, learned how to automate the feature selection and covered the optimization of multiple performance measures. Table 5.1 gives an overview of the most important functions (S3) and classes (R6) used in this chapter.
S3 function | R6 Class | Summary |
---|---|---|
flt() |
Filter |
Selects features by calculating a score for each feature |
Filter$calculate() |
Filter |
Calculates scores on a given task |
fselect() |
FSelectInstanceSingleCrit or FSelectInstanceMultiCrit
|
Specifies a feature selection problem and stores the results |
fs() |
FSelector |
Specifies a feature selection algorithm |
FSelector$optimize() |
FSelector |
Executes the features selection specified by the FSelectInstance with the algorithm specified by the FSelector
|
auto_fselector() |
AutoFSelector |
Defines a learner that includes feature selection |
Resources
- A list of implemented filters in the
mlr3filters
package is provided on the mlr3filters website3. - A summary of wrapper-based feature selection with the
mlr3fselect
package is provided in the mlr3fselect cheatsheet4. - An overview of feature selection methods is provided by Chandrashekar and Sahin (2014).
- A more formal and detailed introduction to filters and wrappers is given in Guyon and Elisseeff (2003).
- Bommert et al. (2020) perform a benchmark of filter methods.
- Filters can be used as part of a machine learning pipeline (Chapter 6).
- Filters can be optimized with hyperparameter optimization (Chapter 4).
5.4 Exercises
- Calculate a correlation filter on the
Motor Trend
data set (mtcars
). - Use the filter from the first exercise to select the five best features in the
mtcars
data set. - Apply a backward selection to the
penguins
data set with a classification tree learner"classif.rpart"
and holdout resampling by the measure classification accuracy. Compare the results with those in Section 5.2.1. Answer the following questions:- Do the selected features differ?
- Which feature selection method achieves a higher classification accuracy?
- Are the accuracy values in b) directly comparable? If not, what has to be changed to make them comparable?
- Automate the feature selection as in Section 5.2.6 with the
spam
data set and a logistic regression learner ("classif.log_reg"
). Hint: Remember to calllibrary("mlr3learners")
for the logistic regression learner.