๐Ÿ“š What does this number mean?
BSS RangeTierInterpretation
< 0brokenBrain is WORSE than always guessing the mean. Something is wrong.
0.00 โ€“ 0.05weakBarely better than baseline. Probably noise.
0.05 โ€“ 0.10fairReal edge but small. Hard to trust for sizing.
0.10 โ€“ 0.20usefulClear edge over baseline. Worth sizing with confidence.
0.20 โ€“ 0.30strongSignificantly better than naive โ€” institutional-grade.
โ‰ฅ 0.30excellentExceptional edge. Verify it's not data leakage.
๐Ÿ“ The math
Brier Score: mean (predicted_prob โˆ’ actual_outcome)ยฒ. Bounded in [0, 1], lower is better. Zero = perfect, 0.25 = no information (constant 50% guess on 50/50 data).

Baseline: always predict the population mean rate. Brier of baseline depends on base rate: for 50/50 outcomes it's 0.25; for skewed outcomes (say 70% wins) it's 0.7 ร— 0.3 = 0.21.

Skill Score: BSS = 1 โˆ’ (BS_model / BS_baseline). This normalizes against the no-information baseline. A skill score above 0 means the brain has measurable edge above randomly predicting the average rate.

Why BSS over raw accuracy: a model can have 70% accuracy on a 70/30 dataset just by always predicting "yes" โ€” its BSS would be ~0 since it's no better than the trivial baseline. BSS catches this; raw accuracy doesn't. BSS is the standard meteorological / forecasting metric for exactly this reason.