Calibration — bpleone / trade

Calibration = does the model say what it means? A well-calibrated model that outputs 70% confidence should actually be right 70% of the time on that batch. If the model says 80% but only wins 60% of those, it's overconfident — your sizing will be too aggressive. If it says 60% but wins 80%, it's underconfident — you're leaving edge on the table.

This page bins predictions by confidence (50-60%, 60-70%, etc.) and plots actual win rate per bin against the diagonal. Closer to diagonal = better calibration.

Brier Score

Lower = better calibration

ECE

Expected Calibration Error

Avg Overconfidence

predicted % - actual %

Total Rated

📊 Reliability Diagram

🔵 dots = actual hit rate per confidence bin · gray dashed = perfect calibration · circles sized by sample count

📋 Per-Bin Calibration

BIN

PREDICTED

ACTUAL

STATUS

🔧 If poorly calibrated

Consistently overconfident (predicted > actual): Reduce sizing on high-confidence calls by 30-50%. Run a Full Retrain — likely overfitting.
Consistently underconfident (predicted < actual): You can size up on high-confidence calls. Model is being more conservative than its track record warrants.
Calibrated on low but not high (or vice versa): Common when training data is unbalanced — most outcomes cluster near 50%. More data fixes it.
Brier Score > 0.25: Roughly equivalent to random — model needs more training.
Brier Score 0.18-0.25: Reasonable for trading models. Most professional systems live here.
Brier Score < 0.15: Excellent. Model is well-calibrated.