Model Monitoring: Detecting Drift Before Disaster
Models Degrade Silently
A production model doesn't fail catastrophically: it degrades gradually. Input data distribution shifts (data drift), the relationship between features and target evolves (concept drift), and predictions become less calibrated (prediction drift). Without active monitoring, these degradations go unnoticed until business impact is visible, and by then it's too late.
Three Types of Drift and How to Measure Them
The important point is not the specific formula, but the expected behavior: if the signals supporting a decision change, the system should lower confidence, trigger review, and preserve evidence to explain the change.
Alerts, Review, and Continuous Assurance
When evidence degrades, the system should react in layers: alert, compare scenarios, block high-impact recommendations when needed, request human review, and record the outcome. That operating response matters more than describing an isolated technical configuration.
Key Takeaways
- Data drift (P(X)) is detectable without labels and should be monitored with PSI, KS test, and KL divergence.
- Concept drift (P(Y|X)) requires labels and is the most dangerous: the model fails silently.
- Auto-retraining must be gated by evaluation suites, not blindly triggered by drift.
