What is Bagging?
Bagging, or Bootstrap Aggregating, is an ensemble learning technique used in machine learning to improve the stability and accuracy of prediction models. It involves generating multiple training datasets from the original training data by sampling with replacement. Each of these datasets is then used to train a separate base model, and the final prediction is obtained by aggregating the predictions of all the base models, typically through voting or averaging.
Why use Bagging?
Bagging is particularly useful when dealing with models that have high variance, such as decision trees. By aggregating the predictions of multiple base models, Bagging reduces the variance and overfitting, resulting in a more accurate and stable ensemble model.
Example of Bagging in Python using scikit-learn:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create a Bagging classifier using decision trees as base models
bagging_clf = BaggingClassifier(
base_estimator=DecisionTreeClassifier(),
n_estimators=10,
random_state=42,
)
# Train the Bagging classifier
bagging_clf.fit(X_train, y_train)
# Evaluate the classifier on the test set
accuracy = bagging_clf.score(X_test, y_test)
print(f"Bagging classifier accuracy: {accuracy:.2f}")