What is the Chi-squared Test?
The Chi-squared test is a statistical hypothesis test used to determine whether there is a significant association between two categorical variables in a sample. It is based on comparing the observed frequencies in a contingency table with the expected frequencies that would occur if the variables were independent. The Chi-squared test is commonly used for feature selection in machine learning, as it can help identify the most relevant features for a given classification task.
Example of using the Chi-squared Test in Python
Here’s a simple example of performing a Chi-squared test using the scipy
library in Python:
import numpy as np
from scipy.stats import chi2_contingency
# Sample contingency table
observed = np.array([[10, 20, 30], [20, 30, 20]])
# Perform the Chi-squared test
chi2, p_value, dof, expected = chi2_contingency(observed)
print("Chi-squared statistic:", chi2)
print("P-value:", p_value)
print("Degrees of freedom:", dof)
print("Expected frequencies:", expected)
This example demonstrates how to use the chi2_contingency
function from the scipy.stats
module to perform a Chi-squared test on a sample contingency table.