What is Stochastic Gradient Descent?
Stochastic Gradient Descent (SGD) is an optimization algorithm used in machine learning and deep learning to minimize a loss function by iteratively updating the model’s parameters. Unlike Batch Gradient Descent, which computes the gradient using the entire dataset, SGD calculates the gradient and updates the parameters using only a single or a small subset (mini-batch) of training examples at each iteration. This approach makes the algorithm faster and more suitable for large-scale datasets.
How does Stochastic Gradient Descent work?
Stochastic Gradient Descent works by following these steps:
- Randomly shuffle the training dataset.
- For each epoch (iteration through the entire dataset), select a single or a mini-batch of training examples.
- Compute the gradient of the loss function with respect to the model parameters using the selected examples.
- Update the model parameters by subtracting the computed gradient multiplied by a learning rate.
Example of Stochastic Gradient Descent in Python
Here’s a simple example of using Stochastic Gradient Descent with scikit-learn:
from sklearn.linear_model import SGDRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Load the Boston housing dataset
boston = load_boston()
X = boston.data
y = boston.target
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Create an SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1, random_state=42)
# Train the model
sgd_reg.fit(X_train, y_train)
# Test the model
score = sgd_reg.score(X_test, y_test)
print("R-squared:", score)