Outlier Detection — bpleone / trade

Why this matters: A logistic regression model is comfortable extrapolating into nonsense. Feed it a feature vector with VIX=80 when it's never seen VIX above 25, and it'll still produce a confident probability. That confidence is fake.

What this does: Tracks running mean + std for each of the 22 features using Welford's algorithm. When a new prediction is requested, computes z-scores. If too many features are >3σ from training distribution, the input is flagged out-of-distribution (OOD) and the prediction confidence is pulled back toward 50% (no signal).

Result: the brain becomes honest about what it doesn't know. On normal days, predictions go through unchanged. On crisis days, the brain says "I haven't seen anything like this — 50/50, stay out."

📊 OOD score over time (last 200 captures)

⚙ Detector parameters

Z_THRESHOLD = 3.0σ // single-feature flag

OOD_TRIGGER = 40% // of features flagged → OOD

MIN_N_FOR_STATS = 30 // before flagging starts

CONFIDENCE_PULL = 50% // pull toward 0.5 when OOD

These thresholds are conservative — the detector won't false-positive on minor noise but will catch real distribution shifts (crisis days, market reopens, etc).

🔬 How this integrates

1. Every capture updates stats: continuous-learner.js calls OutlierDetector.update(features) on every prediction snapshot. The mean/std/min/max for each feature grow with the data.
2. Every prediction gets scored: OutlierDetector.oodScore(features) returns 0-1. Zero means the input is right in the middle of what we've seen. One means extremely far outside the training distribution.
3. High OOD scores pull confidence to 50%: if score > 0.5, predProb = 0.5 + (predProb - 0.5) × (1 - score). The brain stops being confident on things it hasn't seen.
4. Logged to OOD log: every prediction's (ts, oodScore, predProb) is saved for trending. The 24h OOD ratio surfaces whether we're in a normal or anomalous regime.