API Documentation¶
-
class
bca.bca.BCA(estimator, scoring='accuracy', cv=5, delta=1e-05)[source]¶ Feature Selection with Binary Coordinate Ascent Algorithm
Given an external estimator, the goal of binary coordinate ascent (BCA) algorithm is to select features which maximize an objective function of an estimator. It returns a binary vector with its size equal to the number of features, where zero or one indicates a feature at that position is not selected, or selected, respectively. First the best feature subset is initialized (specified as a binary vector). The default initialization is the vector of all zeros, corresponding to no input features selected. The corresponding objective function of an specified estimator is then calculated for the initial subset. BCA algorithm then iteratively select or remove features, one at a time, by flipping the binary elements of the binary vector of features, and examine if the selection/removal can increase the objective function. The process will be repeated over this vector for several times untill a convergance criteria is reached (can be set to number of iterations or a delta for objective value). The algorithm will return a binary vector corresponding to the “best” subset of features.
Read more in the reference link specified below:
http://www.sciencedirect.com/science/article/pii/S0950705116302416
Parameters: - estimator : object
A supervised learning estimator with a `` fit `` method that will be used along with an objective function, in order to calculate the importance of a feature subset.
- scoring : string
The metric to be used as objective to be maximized, e.g., roc_auc, accuracy, etc. Note: at the moment sklearn cross_val_score inside the BCA class supports binary classification only for roc_auc.
- cv : int, cross-validation generator or an iterable, optional
The cv parameter used inside the sklearn cross_val_score.
- delta : float
The delta used to determine the convergance of the objective function.
Examples
The following example shows how to select the optimial subset of features in the breast cancer dataset.
>>> from bca import BCA >>> from sklearn.datasets import load_breast_cancer >>> from sklearn.naive_bayes import GaussianNB >>> X, y = load_breast_cancer().data, load_breast_cancer().target >>> estimator = GaussianNB() >>> selector = BCA(estimator, scoring='accuracy', cv=5) >>> selector = selector.fit(X, y) >>> selector.features [ 1 4 6 7 16 20 21 22 23 27 28] >>> selector.score 0.971989226626 >>> selector.predict(X[20:25]) [1 1 0 0 0]
Methods
fitget_paramspredictset_params-
fit(X, y, initial_subset=None, fit_estimator=True, verbose=True)[source]¶ Fit the BCA model to find the best subset of features, and potentially fit the estimator on the best subset.
Parameters: - X : {array-like, sparse matrix}, shape=[n_samples,n_features]
The training input samples.
- y : array-like, shape = [n_samples]
The target values.
- initial_subset : binary vector, shape=[n_features]
The initial subset. Default to all zeros (“None”).
- fit_estimator : boolean,
Indicates to fit the estimator on the final features or not.
- verbose : boolean
Indicates the verbosity of the algorithm.
Returns: - self : class object
The BCA object with trained classifier