Label Smoothing — bpleone / brain

⚙ Configuration

Epsilon (ε)

Range: ε ∈ [0, 0.25]. Default: 0.05.

🎬 Preview

Win label (y=1)

1.000

↓

0.975

Loss label (y=0)

0.000

↓

0.025

📚 Why label smoothing?

Training on hard binary labels pushes the model toward saturation. The cross-entropy loss for a correct prediction p=0.99 vs y=1 is -log(0.99) ≈ 0.01 — but the gradient is still pulling the model toward p=1.00, which is unreachable. The result: overconfident predictions that don't generalize.

Label smoothing (Szegedy et al. 2016, Müller et al. 2019) replaces hard labels with soft ones: y_smoothed = y × (1 − ε) + ε / K For binary classification (K=2) and ε=0.05:

y = 1 → 1 × 0.95 + 0.025 = 0.975
y = 0 → 0 × 0.95 + 0.025 = 0.025

The model learns that even "wins" aren't perfectly certain — and stops trying to output 0.99+. Empirically this improves calibration and generalization.

Where it's applied: at every model.train(features, label) call site — the main model trainer (model.js fullRetrain) and the per-horizon models (multi-horizon.js trainHorizon). Disable to bypass.

How to tune: ε=0.05 is the standard starting point. Increase up to 0.10 if Brier Skill Score is stagnating with overconfident predictions; decrease to 0 to disable.