## 4.2 Feature Selection / Filtering

Often, data sets include a large number of features. The technique of extracting a subset of relevant features is called “feature selection”. Feature selection can enhance the interpretability of the model, speed up the learning process and improve the learner performance. Different approaches exist to identify the relevant features. In the literature two different approaches exist: One is called “Filtering” and the other approach is often referred to as “feature subset selection” or “wrapper methods”.

What are the differences (Chandrashekar and Sahin 2014)?

• Filter: An external algorithm computes a rank of the variables (e.g. based on the correlation to the response). Then, features are subsetted by a certain criteria, e.g. an absolute number or a percentage of the number of variables. The selected features will then be used to fit a model (with optional hyperparameters selected by tuning). This calculation is usually cheaper than “feature subset selection” in terms of computation time.
• Feature subset selection: Here, no ranking of features is done. Features are selected by a (random) subset of the data. Then, a model is fit and the performance is checked. This is done for a lot of feature combinations in a CV setting and the best combination is reported. This method is very computational intense as a lot of models are fitted. Also, strictly all these models would need to be tuned before the performance is estimated which would require an additional nested level in a CV setting. After all this, the selected subset of features is again fitted (with optional hyperparameters selected by tuning).

There is also a third approach which can be attributed to the “filter” family: The embedded feature-selection methods of some Learner. Read more about how to use these in section embedded feature-selection methods.

Ensemble filters built upon the idea of stacking single filter methods. These are not yet implemented.

All feature selection related functionality is implemented via the extension package mlr3filters.

### 4.2.1 Filters

Filter methods assign an importance value to each feature. Based on these values the features can be ranked and a feature subset can be selected. There is a list of all implemented filter methods in the Appendix.

#### 4.2.1.1 Calculating filter values

Currently, only classification and regression tasks are supported.

The first step it to create a new R object using the class of the desired filter method. Each object of class Filter has a .$calculate() method which calculates the ranking of the features. This method can be executed manually but is also run implicitly in the background if the actual filter functions (.$filter_nfeat(), .$filter_frac(), .$filter_cutoff()) are executed. All functions require a Task and return both the calculated filter values for all features and subset the supplied task:

library(mlr3filters)
filter = FilterJMIM$new() task = mlr_tasks$get("iris")
filter$calculate(task) as.data.table(filter) ## feature score ## 1: Sepal.Length 1.0401 ## 2: Petal.Width 0.9894 ## 3: Petal.Length 0.9881 ## 4: Sepal.Width 0.8314 ### 4.2.2 Wrapper Methods Work in progress :) - via package mlr3fswrap ### 4.2.3 Embedded Methods All Learner with the property “importance” come with integrated feature selection methods. You can find a list of all learners with this property in the Appendix. For some learners the desired filter method needs to be set during learner creation. For example, learner classif.ranger (in mlr3learners comes with multiple integrated methods. See the help page of ranger::ranger. To use method “impurity”, you need to set it via the param_vals argument: library(mlr3learners) lrn = mlr_learners$get("classif.ranger",
param_vals = list(importance = "impurity"))

Now you can use the mlr3filters::FilterImportance class for algorithm-embedded methods to filter a Task.

task = mlr_tasks$get("iris") filter = FilterImportance$new(learner = lrn)
## 3: Sepal.Length  9.937