๐Ÿ“Š Calibration buckets (10 bins)
For each bin of predicted probability, the BLUE bar shows the actual win rate. The GREEN line marks where actual would equal predicted (perfect calibration). Closer the blue is to the green, the better.
๐Ÿ“š Reading this
Expected Calibration Error (ECE) = weighted average gap between predicted probability and actual win rate, across all bins. Lower is better:
  • ECE < 0.03 โ€” excellent (publication-grade)
  • ECE 0.03 โ€“ 0.07 โ€” good (production-ready)
  • ECE 0.07 โ€“ 0.15 โ€” fair (model is biased)
  • ECE > 0.15 โ€” poor (calibration needs work)
Bin gap direction:
  • If actual > predicted โ†’ brain is under-confident in this prob range. Predictions in this bin would be better calibrated if scaled UP.
  • If actual < predicted โ†’ brain is over-confident in this prob range. Scale DOWN.
The Calibrator and IsotonicCalibrator already auto-correct for these gaps โ€” the diagram here shows the FINAL post-calibration result. If the high-prob bins (0.8+) are still showing big gaps, the brain has more learning to do on confident predictions.