Drift Detection & Monitoring
📊 Drift Detection & Monitoring
In MLOps, monitoring is not just about uptime; it’s about Predictive Integrity. We must detect when the model starts lying to us.
🟢 Level 1: Data Drift (Feature Drift)
The distribution of input data changes between training and production.
1. Statistical Tests
- KS Test (Kolmogorov-Smirnov): Used for continuous numerical features.
- Chi-Squared Test: Used for categorical features.
- PSI (Population Stability Index): Measures how much a distribution has shifted.
2. Metrics for Drift
- Mean/Median Shift: The center of the distribution has moved.
- Variance Shift: The data has become more or less “noisy.”
🟡 Level 2: Concept Drift
The relationship between the input () and the output () changes.
3. The “Luxury” Example
- Training: Feature
Price > $1000leads to labelLuxury. - Production (Inflation): Feature
Price > $1000is now consideredMid-Range. - Result: The model still predicts
Luxury, which is now wrong.
🔴 Level 3: Monitoring Stack
To detect drift at scale, you need an Observability Pipeline.
4. Component Stack
- Data Capture: Logging requests and responses to a DB (e.g., MongoDB or BigQuery).
- Drift Engine: Evidently AI or Great Expectations.
- Metrics Store: Prometheus.
- Visualization: Grafana (dashboards for Data Scientists).
5. Automated Response
When drift is detected:
- Trigger Slack Alert.
- Tag the production model as “Degraded.”
- Trigger a Continuous Training (CT) pipeline to update the model.
Every production feature should have a “Baseline Profile” stored in the Model Registry. The monitor should compare the “Live Profile” (last 1 hour of traffic) to the “Baseline Profile” every 10 minutes.