Computer Vision

Anomaly detection under distribution shift

Industrial vision systems face persistent challenges when product variants, lighting conditions, or process parameters shift. How reconstruction-based and self-supervised approaches handle this structural problem.

September 2024·5 min read

Distribution shift as a structural property of industrial environments

Anomaly detection systems in industrial settings are not deployed into a fixed distribution. Product variants change on production schedules. Tooling wear alters surface textures over months. Seasonal changes affect ambient temperature and humidity, which affect both the products being inspected and the imaging hardware itself. These are not edge cases — they are the normal operating conditions of industrial inspection over any multi-month horizon.

Standard anomaly detection approaches assume that the normal class is stationary: a model trained on normal examples at deployment time will remain valid because normal examples continue to come from the same distribution. This assumption is violated structurally in industrial environments. A model that treats every deviation from its training distribution as anomalous will generate false positives whenever the operating distribution shifts, degrading both precision and operator trust.

Reconstruction-based anomaly detection

Reconstruction-based methods — autoencoders, variational autoencoders, normalizing flows — detect anomalies by learning to reconstruct normal examples and flagging regions where reconstruction error is high. The intuition is that a model trained exclusively on normal data will reconstruct normal inputs well and struggle with anomalous inputs, producing detectable residuals.

This approach generalizes reasonably well under moderate distribution shift because the reconstruction model can be updated on new normal data as the distribution evolves. Incremental updates using recent normal examples allow the model to track distributional drift without full retraining. The failure mode is when anomalies are smooth or stylistically consistent with normal examples — in these cases, the autoencoder may reconstruct anomalies well enough that reconstruction residuals are uninformative.

Self-supervised approaches and their advantages

Self-supervised anomaly detection methods leverage pretext tasks — predicting image rotations, solving jigsaw puzzles, or using contrastive objectives — to learn representations of normal data without requiring anomaly examples. The resulting representations encode which visual properties are characteristic of normal inputs; anomalies are detected as inputs that are far from the normal manifold in representation space.

Self-supervised methods have two practical advantages over reconstruction-based approaches under distribution shift. First, they tend to produce more semantically meaningful representations — capturing defect-relevant features rather than pixel-level texture — which makes them more robust to irrelevant distribution changes (lighting variations, camera aging). Second, the normal manifold in representation space can be updated incrementally with new normal examples using density estimation or nearest-neighbor methods, without retraining the feature extractor.

Monitoring and adaptation in production

Managing anomaly detection under distribution shift in production requires active monitoring of two distinct quantities: the alarm rate (which tracks whether the detector is flagging anomalies at the expected rate) and the feature distribution of normal examples (which tracks whether the operating distribution has shifted). A sudden increase in alarm rate may indicate either a genuine quality event or a distribution shift; distinguishing them requires inspecting the feature distributions of flagged examples.

Operational robustness comes from building adaptation into the system architecture rather than treating it as a manual maintenance task. Pipelines that automatically enqueue recent unalerted examples for model update, detect distributional anomalies in normal examples before they propagate into false positives, and provide operators with visualizations of how the normal distribution is evolving — these structural properties are what distinguish production-grade anomaly detection systems from research prototypes.

Working on a complex AI problem?

Our research orientation means we're always interested in technically challenging problems. Let's explore what's possible.

Start a conversation Back to insights