Expected Calibration Error (ECE) = weighted average gap between predicted probability and actual win rate, across all bins. Lower is better:
- ECE < 0.03 โ excellent (publication-grade)
- ECE 0.03 โ 0.07 โ good (production-ready)
- ECE 0.07 โ 0.15 โ fair (model is biased)
- ECE > 0.15 โ poor (calibration needs work)
Bin gap direction:
- If
actual > predicted โ brain is under-confident in this prob range. Predictions in this bin would be better calibrated if scaled UP.
- If
actual < predicted โ brain is over-confident in this prob range. Scale DOWN.
The Calibrator and IsotonicCalibrator already auto-correct for these gaps โ the diagram here shows the FINAL post-calibration result. If the high-prob bins (0.8+) are still showing big gaps, the brain has more learning to do on confident predictions.