Non-negative Matrix Factorization (NMF)
Non-negative Matrix Factorization (NMF) is a dimensionality reduction and data analysis technique that decomposes a non-negative matrix into two lower-dimensional non-negative matrices, approximating the original data with a smaller number of latent features. NMF is particularly useful in applications such as image processing, text mining, and recommendation systems, where the data can be represented by non-negative values.
Example
In this example, we’ll demonstrate how to use NMF for topic extraction from a collection of documents using the scikit-learn library.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
documents = [
"The quick brown fox jumps over the lazy dog.",
"I enjoy reading about machine learning and natural language processing.",
"The weather is sunny today, perfect for a walk in the park.",
"Deep learning is a popular subfield of machine learning."
]
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(documents)
nmf = NMF(n_components=2, random_state=42)
W = nmf.fit_transform(X)
H = nmf.components_
# Print the topics and their top words
for i, topic in enumerate(H):
print(f"Topic {i + 1}:")
print(" ".join([vectorizer.get_feature_names()[index] for index in topic.argsort()[-5:]]))
Output:
Topic 1:
park walk weather sunny today
Topic 2:
language natural processing learning machine