What is Regression?
Regression is a statistical method used in machine learning and data analysis to determine the relationship between a dependent variable (often called the target) and one or more independent variables (often called features or predictors). Regression helps in understanding how the value of the dependent variable changes as the values of the independent variables change. The main goal of regression is to create a model that can predict the value of the dependent variable based on the values of the independent variables.
Example of a Regression Problem
Suppose we have the following dataset showing the number of hours studied and the corresponding exam scores for a group of students:
Hours_Studied | Exam_Score |
---|---|
1 | 20 |
2 | 40 |
3 | 60 |
4 | 80 |
We can use linear regression to create a model that predicts the exam score based on the number of hours studied. In this case, the dependent variable is the exam score, and the independent variable is the number of hours studied.
Here is a simple Python code to perform linear regression using the Scikit-learn library:
import numpy as np
from sklearn.linear_model import LinearRegression
# Define the independent variable (hours studied) and dependent variable (exam score)
X = np.array([1, 2, 3, 4]).reshape(-1, 1)
y = np.array([20, 40, 60, 80])
# Create a linear regression model
model = LinearRegression()
# Fit the model to the data
model.fit(X, y)
# Use the model to predict the exam score for a student who studied for 5 hours
prediction = model.predict(np.array([5]).reshape(-1, 1))
print("Predicted exam score:", prediction)
This code would output:
Predicted exam score: [100.]
The linear regression model predicts that a student who studied for 5 hours would score 100 on the exam.