For ~20% of resolutions (sampled to keep cost low), the brain runs counterfactual replay:
- Take the features that produced the prediction
- For each feature, perturb it by ยฑ10% and re-predict
- Measure the maximum deviation across all 2รN perturbations
- Compute robustness = 1 โ max_dev / 0.5 (1.0 = no change; 0.0 = predictions swing to extremes)
Robustness score interpretation:
- โฅ 0.95 โ rock-solid, small feature changes don't move the prediction
- 0.7 โ 0.95 โ healthy
- 0.5 โ 0.7 โ somewhat fragile (one feature could flip it)
- < 0.5 โ knife-edge, prediction would flip with normal noise
Why this matters: a 70% prediction with 0.95 robustness is much more trustworthy than a 70% prediction with 0.4 robustness. The model thinks both are equally likely to win, but one is on a knife edge. If brittlePct is consistently >30%, the model is over-fitting to specific feature configurations and would benefit from MORE regularization (turn up label smoothing or confidence penalty).