Mixup — bpleone / brain

⚙ Configuration

Disabled by default because it generates extra training examples per resolution. Enable when:

Sample size is small (<200 resolutions) and you want regularization
Reliability diagram shows the model is overconfident in specific regions
Counterfactual replay shows brittlePct > 30%

Alpha (α)

α controls the Beta(α, α) distribution that λ is drawn from. α=0.2 → mostly λ near 0 or 1 (light mix). α=1.0 → uniform λ (heavy mix).

📚 How mixup works

Standard SGD trains on observed examples (x, y). Mixup (Zhang et al. 2018) generates additional synthetic examples by linearly interpolating between pairs:

x_mix = λ × x_a + (1-λ) × x_b
y_mix = λ × y_a + (1-λ) × y_b
where λ ~ Beta(α, α)

Effect: the model is forced to learn linear interpolations between data points. This regularizes toward smoother decision boundaries — the prediction at a point halfway between two training examples should be roughly halfway between their predictions. Empirically this:

Reduces overfitting on small datasets
Improves robustness to small input perturbations (catches the same kind of brittleness Counterfactual Replay measures)
Improves calibration (synthetic intermediate labels train the model to output intermediate probabilities)

Pipeline: after the main training pass on resolved examples, the continuous-learner generates K=5 synthetic mixup examples and trains on them with discounted sample weight (0.5×). Disabled by default.