3.2 Feature Selection / Filtering
Often, data sets include a large number of features. The technique of extracting a subset of relevant features is called “feature selection”.
The objective of feature selection is to fit the sparse dependent of a model on a subset of available data features in the most suitable manner. Feature selection can enhance the interpretability of the model, speed up the learning process and improve the learner performance. Different approaches exist to identify the relevant features. Two different approaches are emphasized in the literature: one is called Filtering and the other approach is often referred to as feature subset selection or wrapper methods.
What are the differences (Chandrashekar and Sahin 2014)?
- Filtering: An external algorithm computes a rank of the variables (e.g. based on the correlation to the response). Then, features are subsetted by a certain criteria, e.g. an absolute number or a percentage of the number of variables. The selected features will then be used to fit a model (with optional hyperparameters selected by tuning). This calculation is usually cheaper than “feature subset selection” in terms of computation time.
- Wrapper Methods: Here, no ranking of features is done. Features are selected by a (random) subset of the data. Then, we fit a model and subsequently assess the performance. This is done for a lot of feature combinations in a cross-validation (CV) setting and the best combination is reported. This method is very computationally intensive as a lot of models are fitted. Also, strictly speaking all these models would need to be tuned before the performance is estimated. This would require an additional nested level in a CV setting. After undertaken all of these steps, the selected subset of features is again fitted (with optional hyperparameters selected by tuning).
There is also a third approach which can be attributed to the “filter” family:
The embedded feature-selection methods of some
Read more about how to use these in section embedded feature-selection methods.
Ensemble filters built upon the idea of stacking single filter methods. These are not yet implemented.
All functionality that is related to feature selection is implemented via the extension package mlr3filters.
Filter methods assign an importance value to each feature. Based on these values the features can be ranked. Thereafter, we are able to select a feature subset. There is a list of all implemented filter methods in the Appendix.
3.2.2 Calculating filter values
Currently, only classification and regression tasks are supported.
The first step it to create a new R object using the class of the desired filter method.
Each object of class
Filter has a
.$calculate() method which calculates the filter values and ranks them in a descending order.
## feature score ## 1: Sepal.Length 1.0401 ## 2: Petal.Width 0.9894 ## 3: Petal.Length 0.9881 ## 4: Sepal.Width 0.8314
Some filters support changing specific hyperparameters.
This is done similar to setting hyperparameters of a
## <ParamSet> ## id class lower upper ## 1: use ParamFct NA NA ## 2: method ParamFct NA NA ## levels ## 1: everything,all.obs,complete.obs,na.or.complete,pairwise.complete.obs ## 2: pearson,kendall,spearman ## default value ## 1: everything ## 2: pearson
## <ParamSet> ## id class lower upper ## 1: use ParamFct NA NA ## 2: method ParamFct NA NA ## levels ## 1: everything,all.obs,complete.obs,na.or.complete,pairwise.complete.obs ## 2: pearson,kendall,spearman ## default value ## 1: everything ## 2: pearson spearman
Rather than taking the “long” R6 way to create a filter, there is also a built-in shorthand notation for filter creation:
## <FilterCMIM:cmim> ## Task Types: classif, regr ## Task Properties: - ## Packages: praznik ## Feature types: integer, numeric, factor, ordered
3.2.3 Variable Importance Filters
Learner with the property “importance” come with integrated feature selection methods.
You can find a list of all learners with this property in the Appendix.
For some learners the desired filter method needs to be set during learner creation.
For example, learner
classif.ranger (in the package mlr3learners) comes with multiple integrated methods.
See the help page of
To use method “impurity”, you need to set the filter method during construction.
## feature score ## 1: Petal.Width 44.276 ## 2: Petal.Length 43.604 ## 3: Sepal.Length 9.402
3.2.4 Ensemble Methods
Work in progress.
3.2.5 Wrapper Methods
Work in progress - via package mlr3fswrap
Chandrashekar, Girish, and Ferat Sahin. 2014. “A Survey on Feature Selection Methods.” Computers and Electrical Engineering 40 (1): 16–28. https://doi.org/https://doi.org/10.1016/j.compeleceng.2013.11.024.