Calibration = does the model say what it means? A well-calibrated model that outputs 70% confidence should actually be right 70% of the time on that batch. If the model says 80% but only wins 60% of those, it's
overconfident โ your sizing will be too aggressive. If it says 60% but wins 80%, it's
underconfident โ you're leaving edge on the table.
This page bins predictions by confidence (50-60%, 60-70%, etc.) and plots actual win rate per bin against the diagonal. Closer to diagonal = better calibration.