Skip to content

Drift Detection & Monitoring

📊 Drift Detection & Monitoring

In MLOps, monitoring is not just about uptime; it’s about Predictive Integrity. We must detect when the model starts lying to us.

🟢 Level 1: Data Drift (Feature Drift)

The distribution of input data changes between training and production.

1. Statistical Tests

KS Test (Kolmogorov-Smirnov): Used for continuous numerical features.
Chi-Squared Test: Used for categorical features.
PSI (Population Stability Index): Measures how much a distribution has shifted.

2. Metrics for Drift

Mean/Median Shift: The center of the distribution has moved.
Variance Shift: The data has become more or less “noisy.”

🟡 Level 2: Concept Drift

The relationship between the input ( $X$ ) and the output ( $y$ ) changes.

3. The “Luxury” Example

Training: Feature Price > $1000 leads to label Luxury.
Production (Inflation): Feature Price > $1000 is now considered Mid-Range.
Result: The model still predicts Luxury, which is now wrong.

🔴 Level 3: Monitoring Stack

To detect drift at scale, you need an Observability Pipeline.

4. Component Stack

Data Capture: Logging requests and responses to a DB (e.g., MongoDB or BigQuery).
Drift Engine: Evidently AI or Great Expectations.
Metrics Store: Prometheus.
Visualization: Grafana (dashboards for Data Scientists).

5. Automated Response

When drift is detected:

Trigger Slack Alert.
Tag the production model as “Degraded.”
Trigger a Continuous Training (CT) pipeline to update the model.

Every production feature should have a “Baseline Profile” stored in the Model Registry. The monitor should compare the “Live Profile” (last 1 hour of traffic) to the “Baseline Profile” every 10 minutes.