⚙ Configuration
Disabled by default because it generates extra training examples per resolution. Enable when:
  • Sample size is small (<200 resolutions) and you want regularization
  • Reliability diagram shows the model is overconfident in specific regions
  • Counterfactual replay shows brittlePct > 30%
α controls the Beta(α, α) distribution that λ is drawn from. α=0.2 → mostly λ near 0 or 1 (light mix). α=1.0 → uniform λ (heavy mix).
📚 How mixup works
Standard SGD trains on observed examples (x, y). Mixup (Zhang et al. 2018) generates additional synthetic examples by linearly interpolating between pairs: x_mix = λ × x_a + (1-λ) × x_b
y_mix = λ × y_a + (1-λ) × y_b
where λ ~ Beta(α, α)
Effect: the model is forced to learn linear interpolations between data points. This regularizes toward smoother decision boundaries — the prediction at a point halfway between two training examples should be roughly halfway between their predictions. Empirically this:
  • Reduces overfitting on small datasets
  • Improves robustness to small input perturbations (catches the same kind of brittleness Counterfactual Replay measures)
  • Improves calibration (synthetic intermediate labels train the model to output intermediate probabilities)
Pipeline: after the main training pass on resolved examples, the continuous-learner generates K=5 synthetic mixup examples and trains on them with discounted sample weight (0.5×). Disabled by default.