sklearn.ensemble.StackingClassifier

class sklearn.ensemble.StackingClassifier(estimators=None, final_estimator=None, cv=None, method_estimators=’auto’, n_jobs=1, random_state=None, verbose=0)[source]

Stacked of estimators using a final classifier.

Stacked generalization consists in stacking the output of individual estimator and use a classifier to compute the final prediction. Stacking allows to combine the strength of each individual estimator. It should be noted that the final estimator is trained through cross-validation.

Read more in the User Guide.

Parameters:
estimators : list of (string, estimator) tuples

Base estimators which will be stacked together.

final_estimator : estimator object

A classifier which will be used to combine the base estimators.

cv : int, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 3-fold cross validation,
  • integer, to specify the number of folds in a (Stratified) KFold,
  • An object to be used as a cross-validation generator,
  • An iterable yielding train, test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

method_estimators : list of string or ‘auto’, optional

Methods called for each base estimator. It can be:

  • if a list of string in which each string is associated to the estimators,
  • if auto, it will try to invoke, for each estimator,

predict_proba, decision_function or predict in that order.

n_jobs : int, optional (default=1)

The number of jobs to fit the estimators in parallel. If -1, then the number of jobs is set to the number of cores.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used to set the cv.

Attributes:
estimators_ : list of estimator object

The base estimators fitted.

stacking_classifier_ : estimator object

The classifier to stacked the base estimators fitted.

method_estimators_ : list of string

The method used by each base estimator.

References

[1]Wolpert, David H. “Stacked generalization.” Neural networks 5.2 (1992): 241-259.

Examples

>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.svm import LinearSVC
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import StackingClassifier
>>> estimators = [('lr', LogisticRegression()), ('svr', LinearSVC())]
>>> clf = StackingClassifier(estimators=estimators,
...                          final_estimator=RandomForestClassifier())
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> clf.fit(X_train, y_train).score(X_test, y_test) 
0...

Methods

fit(X, y[, sample_weight]) Fit the estimators.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get the parameters of the stacking estimator.
predict(X) Predict target for X.
predict_proba(X) Predict class probabilities for X.
score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels.
set_params(**params) Setting the parameters for the stacking estimator.
transform(X) Return class labels or probabilities for X for each estimator.
__init__(estimators=None, final_estimator=None, cv=None, method_estimators=’auto’, n_jobs=1, random_state=None, verbose=0)[source]
fit(X, y, sample_weight=None)[source]

Fit the estimators.

Parameters:
X : {array-like, sparse matrix}, shape (n_samples, n_features)

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape (n_samples,)

Target values.

sample_weight : array-like, shape (n_samples,) or None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if all underlying estimators support sample weights.

Returns:
self : object
fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)[source]

Get the parameters of the stacking estimator.

Parameters:
deep: bool

Setting it to True gets the various classifiers and the parameters of the classifiers as well.

predict(X)[source]

Predict target for X.

Parameters:
X : {array-like, sparse matrix}, shape (n_samples, n_features)

Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:
y_pred : ndarray, shape (n_samples,)

Predicted targets.

predict_proba(X)[source]

Predict class probabilities for X.

Parameters:
X : {array-like, sparse matrix} of shape (n_samples, n_features)

Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:
probabilities : ndarray, shape (n_samples, n_classes)

The class probabilities of the input samples.

score(X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)[source]

Setting the parameters for the stacking estimator.

Valid parameter keys can be listed with get_params().

Parameters:
params : keyword arguments

Specific parameters using e.g. set_params(parameter_name=new_value) In addition, to setting the parameters of the VotingClassifier, the individual classifiers of the VotingClassifier can also be set or replaced by setting them to None.

Examples

# In this example, the RandomForestClassifier is removed clf1 = LogisticRegression() clf2 = RandomForestClassifier() eclf = StackingClassifier(estimators=[(‘lr’, clf1), (‘rf’, clf2)] eclf.set_params(rf=None)

transform(X)[source]

Return class labels or probabilities for X for each estimator.

Parameters:
X : {array-like, sparse matrix}, shape (n_samples, n_features)

Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:
y_preds : ndarray, shape (n_samples, n_estimators)

Prediction outputs for each estimator.