๐Ÿ“š How it works
For ~20% of resolutions (sampled to keep cost low), the brain runs counterfactual replay:
  1. Take the features that produced the prediction
  2. For each feature, perturb it by ยฑ10% and re-predict
  3. Measure the maximum deviation across all 2ร—N perturbations
  4. Compute robustness = 1 โˆ’ max_dev / 0.5 (1.0 = no change; 0.0 = predictions swing to extremes)
Robustness score interpretation:
  • โ‰ฅ 0.95 โ€” rock-solid, small feature changes don't move the prediction
  • 0.7 โ€“ 0.95 โ€” healthy
  • 0.5 โ€“ 0.7 โ€” somewhat fragile (one feature could flip it)
  • < 0.5 โ€” knife-edge, prediction would flip with normal noise
Why this matters: a 70% prediction with 0.95 robustness is much more trustworthy than a 70% prediction with 0.4 robustness. The model thinks both are equally likely to win, but one is on a knife edge. If brittlePct is consistently >30%, the model is over-fitting to specific feature configurations and would benefit from MORE regularization (turn up label smoothing or confidence penalty).