What is Out-of-Distribution Detection?
Out-of-distribution (OOD) detection is the process of identifying data samples that belong to a different distribution than the one used to train a machine learning model. OOD detection is essential for ensuring the robustness and reliability of a model, as models can produce unreliable predictions or high-confidence errors when faced with data not seen during training.
Example of Out-of-Distribution Detection:
Suppose we have trained a deep neural network for handwritten digit recognition using the MNIST dataset, and we want to detect if the model is presented with images of letters instead of digits. In this case, the images of letters are OOD samples.
Here’s a Python code example using the pyod
package:
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.datasets import cifar10
from pyod.models.ocsvm import OCSVM
# Load MNIST and CIFAR-10 datasets
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = mnist.load_data()
(_, _), (x_test_cifar, y_test_cifar) = cifar10.load_data()
# Preprocess the data
x_train_mnist = x_train_mnist.reshape(-1, 28*28) / 255.0
x_test_mnist = x_test_mnist.reshape(-1, 28*28) / 255.0
x_test_cifar = x_test_cifar.mean(axis=3).reshape(-1, 32*32) / 255.0
# Train a One-Class SVM on the MNIST training data
ocsvm = OCSVM()
ocsvm.fit(x_train_mnist)
# Test the model on MNIST and CIFAR-10 test data
mnist_scores = ocsvm.decision_function(x_test_mnist)
cifar_scores = ocsvm.decision_function(x_test_cifar)
# Plot the decision scores
plt.hist(mnist_scores, bins='auto', alpha=0.7, label='MNIST (in-distribution)')
plt.hist(cifar_scores, bins='auto', alpha=0.7, label='CIFAR-10 (out-of-distribution)')
plt.xlabel('Decision Score')
plt.ylabel('Frequency')
plt.legend()
plt.show()
In this example, we train a One-Class SVM on the MNIST training data and test it on both MNIST and CIFAR-10 test data. The decision scores for the in-distribution (MNIST) and out-of-distribution (CIFAR-10) samples are plotted in a histogram, showing the separation between the two distributions.