4.2 Feature Selection / Filtering

Often, data sets include a large number of features. The technique of extracting a subset of relevant features is called “feature selection”. The objective of feature selection is to fit the sparce dependent of a model on a subset of available data features in the most suitable manner. Feature selection can enhance the interpretability of the model, speed up the learning process and improve the learner performance. Different approaches exist to identify the relevant features. In the literature two different approaches exist: One is called “Filtering” and the other approach is often referred to as “feature subset selection” or “wrapper methods”.

What are the differences (Chandrashekar and Sahin 2014)?

  • Filter: An external algorithm computes a rank of the variables (e.g. based on the correlation to the response). Then, features are subsetted by a certain criteria, e.g. an absolute number or a percentage of the number of variables. The selected features will then be used to fit a model (with optional hyperparameters selected by tuning). This calculation is usually cheaper than “feature subset selection” in terms of computation time.
  • Feature subset selection: Here, no ranking of features is done. Features are selected by a (random) subset of the data. Then, a model is fitted and the performance is checked. This is done for a lot of feature combinations in a cross-validation (CV) setting and the best combination is reported. This method is very computational intense as a lot of models are fitted. Also, strictly all these models would need to be tuned before the performance is estimated which would require an additional nested level in a CV setting. After all this, the selected subset of features is again fitted (with optional hyperparameters selected by tuning).

There is also a third approach which can be attributed to the “filter” family: The embedded feature-selection methods of some Learner. Read more about how to use these in section embedded feature-selection methods.

Ensemble filters built upon the idea of stacking single filter methods. These are not yet implemented.

All feature selection related functionality is implemented via the extension package mlr3filters.

4.2.1 Filters

Filter methods assign an importance value to each feature. Based on these values the features can be ranked and a feature subset can be selected. There is a list of all implemented filter methods in the Appendix.

4.2.2 Calculating filter values

Currently, only classification and regression tasks are supported.

The first step it to create a new R object using the class of the desired filter method. Each object of class Filter has a .$calculate() method which calculates the filter values and ranks them in a descending order.

Some filters support changing specific hyperparameters. This is done similar to setting hyperparameters of a Learner using .$param_set$values:

Rather than taking the “long” R6 way to create a filter, there is also a built-in shorthand notation for filter creation

4.2.3 Variable Importance Filters

All Learner with the property “importance” come with integrated feature selection methods.

You can find a list of all learners with this property in the Appendix.

For some learners the desired filter method needs to be set during learner creation. For example, learner classif.ranger (in mlr3learners comes with multiple integrated methods. See the help page of ranger::ranger. To use method “impurity”, you need to set the filter method during construction.

Now you can use the mlr3filters::FilterImportance class for algorithm-embedded methods to filter a Task.

4.2.4 Ensemble Methods

Work in progress :)

4.2.5 Wrapper Methods

Work in progress :) - via package mlr3fswrap

References

Chandrashekar, Girish, and Ferat Sahin. 2014. “A Survey on Feature Selection Methods.” Computers and Electrical Engineering 40 (1): 16–28. https://doi.org/https://doi.org/10.1016/j.compeleceng.2013.11.024.