API Documentation

class bca.bca.BCA(estimator, scoring='accuracy', cv=5, delta=1e-05)[source]

Feature Selection with Binary Coordinate Ascent Algorithm

Given an external estimator, the goal of binary coordinate ascent (BCA) algorithm is to select features which maximize an objective function of an estimator. It returns a binary vector with its size equal to the number of features, where zero or one indicates a feature at that position is not selected, or selected, respectively. First the best feature subset is initialized (specified as a binary vector). The default initialization is the vector of all zeros, corresponding to no input features selected. The corresponding objective function of an specified estimator is then calculated for the initial subset. BCA algorithm then iteratively select or remove features, one at a time, by flipping the binary elements of the binary vector of features, and examine if the selection/removal can increase the objective function. The process will be repeated over this vector for several times untill a convergance criteria is reached (can be set to number of iterations or a delta for objective value). The algorithm will return a binary vector corresponding to the “best” subset of features.

Read more in the reference link specified below:

http://www.sciencedirect.com/science/article/pii/S0950705116302416

Parameters:
estimator : object

A supervised learning estimator with a `` fit `` method that will be used along with an objective function, in order to calculate the importance of a feature subset.

scoring : string

The metric to be used as objective to be maximized, e.g., roc_auc, accuracy, etc. Note: at the moment sklearn cross_val_score inside the BCA class supports binary classification only for roc_auc.

cv : int, cross-validation generator or an iterable, optional

The cv parameter used inside the sklearn cross_val_score.

delta : float

The delta used to determine the convergance of the objective function.

Examples

The following example shows how to select the optimial subset of features in the breast cancer dataset.

>>> from bca import BCA
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.naive_bayes import GaussianNB
>>> X, y = load_breast_cancer().data, load_breast_cancer().target
>>> estimator = GaussianNB()
>>> selector = BCA(estimator, scoring='accuracy', cv=5)
>>> selector = selector.fit(X, y)
>>> selector.features
[ 1  4  6  7 16 20 21 22 23 27 28]
>>> selector.score
0.971989226626
>>> selector.predict(X[20:25])
[1 1 0 0 0]

Methods

fit
get_params
predict
set_params
fit(X, y, initial_subset=None, fit_estimator=True, verbose=True)[source]

Fit the BCA model to find the best subset of features, and potentially fit the estimator on the best subset.

Parameters:
X : {array-like, sparse matrix}, shape=[n_samples,n_features]

The training input samples.

y : array-like, shape = [n_samples]

The target values.

initial_subset : binary vector, shape=[n_features]

The initial subset. Default to all zeros (“None”).

fit_estimator : boolean,

Indicates to fit the estimator on the final features or not.

verbose : boolean

Indicates the verbosity of the algorithm.

Returns:
self : class object

The BCA object with trained classifier

predict(X)[source]

Reduce X to the selected features and then predict using the underlying estimator.

Parameters:
X : array of shape [n_samples, n_features]

The input samples.

Returns:
y : array of shape [n_samples]

The predicted target values.