3.2 Feature Selection / Filtering
Often, data sets include a large number of features. The technique of extracting a subset of relevant features is called “feature selection”. The objective of feature selection is to fit the sparse dependent of a model on a subset of available data features in the most suitable manner. Feature selection can enhance the interpretability of the model, speed up the learning process and improve the learner performance. Different approaches exist to identify the relevant features. In the literature two distinct approaches are emphasized: One is called Filtering and the other approach is often referred to as feature subset selection or wrapper methods.
What are the differences (Chandrashekar and Sahin 2014)?
- Filtering: An external algorithm computes a rank of the variables (e.g. based on the correlation to the response). Then, features are subsetted by a certain criteria, e.g. an absolute number or a percentage of the number of variables. The selected features will then be used to fit a model (with optional hyperparameters selected by tuning). This calculation is usually cheaper than “feature subset selection” in terms of computation time.
- Wrapper Methods: Here, no ranking of features is done. Features are selected by a (random) subset of the data. Then, we fit a model and subseqeuently assess the performance. This is done for a lot of feature combinations in a cross-validation (CV) setting and the best combination is reported. This method is very computational intense as a lot of models are fitted. Also, strictly speaking all these models would need to be tuned before the performance is estimated. This would require an additional nested level in a CV setting. After undertaken all of these steps, the selected subset of features is again fitted (with optional hyperparameters selected by tuning).
There is also a third approach which can be attributed to the “filter” family:
The embedded feature-selection methods of some
Read more about how to use these in section embedded feature-selection methods.
Ensemble filters built upon the idea of stacking single filter methods. These are not yet implemented.
All functionality that is related to feature selection is implemented via the extension package
Filter methods assign an importance value to each feature. Based on these values the features can be ranked. Thereafter, we are able to select a feature subset. There is a list of all implemented filter methods in the Appendix.
3.2.2 Calculating filter values
Currently, only classification and regression tasks are supported.
The first step it to create a new R object using the class of the desired filter method.
Each object of class
Filter has a
.$calculate() method which calculates the filter values and ranks them in a descending order.
Some filters support changing specific hyperparameters.
This is done similar to setting hyperparameters of a
filter_cor = FilterCorrelation$new() filter_cor$param_set ## ParamSet: ## id class lower upper ## 1: use ParamFct NA NA ## 2: method ParamFct NA NA ## levels ## 1: everything,all.obs,complete.obs,na.or.complete,pairwise.complete.obs ## 2: pearson,kendall,spearman ## default value ## 1: everything ## 2: pearson # change parameter 'method' filter_cor$param_set$values = list(method = "spearman") filter_cor$param_set ## ParamSet: ## id class lower upper ## 1: use ParamFct NA NA ## 2: method ParamFct NA NA ## levels ## 1: everything,all.obs,complete.obs,na.or.complete,pairwise.complete.obs ## 2: pearson,kendall,spearman ## default value ## 1: everything ## 2: pearson spearman
Rather than taking the “long” R6 way to create a filter, there is also a built-in shorthand notation for filter creation:
3.2.3 Variable Importance Filters
Learner with the property “importance” come with integrated feature selection methods.
You can find a list of all learners with this property in the Appendix.
For some learners the desired filter method needs to be set during learner creation.
For example, learner
classif.ranger (in the package mlr3learners) comes with multiple integrated methods.
See the help page of
To use method “impurity”, you need to set the filter method during construction.
3.2.4 Ensemble Methods
Work in progress :)
3.2.5 Wrapper Methods
Work in progress :) - via package mlr3fswrap
Chandrashekar, Girish, and Ferat Sahin. 2014. “A Survey on Feature Selection Methods.” Computers and Electrical Engineering 40 (1): 16–28. https://doi.org/https://doi.org/10.1016/j.compeleceng.2013.11.024.