Brier Score: mean (predicted_prob โ actual_outcome)ยฒ. Bounded in [0, 1], lower is better. Zero = perfect, 0.25 = no information (constant 50% guess on 50/50 data).
Baseline: always predict the population mean rate. Brier of baseline depends on base rate: for 50/50 outcomes it's 0.25; for skewed outcomes (say 70% wins) it's 0.7 ร 0.3 = 0.21.
Skill Score: BSS = 1 โ (BS_model / BS_baseline). This normalizes against the no-information baseline. A skill score above 0 means the brain has measurable edge above randomly predicting the average rate.
Why BSS over raw accuracy: a model can have 70% accuracy on a 70/30 dataset just by always predicting "yes" โ its BSS would be ~0 since it's no better than the trivial baseline. BSS catches this; raw accuracy doesn't. BSS is the standard meteorological / forecasting metric for exactly this reason.